Þ¥bonds€¬cell_resultsÞáÙ$8ddf6b9d-d76d-401f-96ad-2a0b5c114fa4Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ;create_noisy_gridworld_mdp (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚš@®°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$8ddf6b9d-d76d-401f-96ad-2a0b5c114fa4¹depends_on_disabled_cellsÂ§runtimeÎ$™}µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$5290ae65-6f56-4849-a842-fe347315c6dcŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÚX

6.2 Advantages of TD Prediction Methods

TD methods can learn before an episode terminates, so this is an advantage in environments that have very long episodes. Also, in continuing problems, Monte Carlo methods may not be suitable at all because there is no termination condition. Furthermore, if we consider off-policy learning, Monte Carlo methods must ignore returns if exploratory actions (ones never taken by the target policy) are taken later in the episode whereas TD methods could learn from individual steps that are not exploratory regardless of what happens later on.

For any fixed policy $v_\pi$ TD(0) has been proved to converge to $v_\pi$ in the mean for a constant step-size parameter if it is sufficiently small, and with probability 1 if the step-size parameter decreases according to the usual stochastic approximation conditions (2.7). Since both TD and Monte Carlo methods converge, one natural question is which converges faster, which makes more efficient use of limited data? There is no mathematical proof to this question, nor is it clear how to even pose it formally; however, TD methods have usually been found to converge faster than constant-Î± MC methods on stochastic tasks, as illustrated in Example 6.2.

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ· ¯°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$5290ae65-6f56-4849-a842-fe347315c6dc¹depends_on_disabled_cellsÂ§runtimeÎâµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$b3d4117f-7db4-43a6-8427-c08f3542d71fŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ(poisson (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚšÏê°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$b3d4117f-7db4-43a6-8427-c08f3542d71f¹depends_on_disabled_cellsÂ§runtimeÎÇPµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$3ed12c33-ab0a-49b1-b9e7-c4305ba35767Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ*init_step (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•çò"S°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$3ed12c33-ab0a-49b1-b9e7-c4305ba35767¹depends_on_disabled_cellsÂ§runtimeÎçµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$209881b3-3ac8-490e-97bd-fa5ae24a39f5Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ.update_value! (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•ç¤3°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$209881b3-3ac8-490e-97bd-fa5ae24a39f5¹depends_on_disabled_cellsÂ§runtimeÎ&Y¦µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$6e06bd39-486f-425a-bbca-bf363b58988cŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÚ

6.6 Expected Sarsa

Consider the learning algorithm that is just like Q-learning except that intsead of the maximization over next state-action pairs it uses the expected value, taking into account how likely each action is under the current policy. That is consider the algorithm with the update rule

$$Q(S_t, A_t) \leftarrow Q(S_t, A_t) + \alpha \left [ R_{t+1} + \gamma \text{E}_\pi [Q(S_{t+1}, A_{t+1})|S_{t+1}] - Q(S_t, A_t) \right ]$$

$$= Q(S_t, A_t) + \alpha \left [ R_{t+1} + \gamma \sum_a \pi(a|S_{t+1})Q(S_{t+1}, a) - Q(S_t, A_t) \right ]$$

but that otherwise follows the scheme of Q-learning. Given the next state, $S_{t+1}$, this algorithm moves deterministically in the same direction as Sarsa moves in expectation, and accordingly it is called Expected Sarsa. Although more computationally complex than Sarsa, it eliminates the variance due to the random selection of $A_{t+1}$

In general Expected Sarsa might use a policy different from the target policy Ï€ to generate behavior in which case it becomes an off-policy algorithm. For example, supppose Ï€ is the greedy policy while behavior is more exploratory; then Expected Sarsa is exactly Q-learning. In this sense Expected Sarsa subsumes and generalizes Q-learning while reliably improving over Sarsa.

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ¼‚°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$6e06bd39-486f-425a-bbca-bf363b58988c¹depends_on_disabled_cellsÂ§runtimeÎŒ÷µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$e039a5be-4b59-4023-be97-2d1de970be27Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙD

Double Learning Implementation

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ½×Ã°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$e039a5be-4b59-4023-be97-2d1de970be27¹depends_on_disabled_cellsÂ§runtimeÎådµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$2786101e-d365-4d6a-8de7-b9794499efb4Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ,example_6_2 (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•æ~=°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$2786101e-d365-4d6a-8de7-b9794499efb4¹depends_on_disabled_cellsÂ§runtimeÎôôµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$14b456f9-5fd1-4340-a3c7-ab9b91b4e3e0Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙº ¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ¿ÿ¬°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$14b456f9-5fd1-4340-a3c7-ab9b91b4e3e0¹depends_on_disabled_cellsÂ§runtimeÎsµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$ec285c96-4a75-4af6-8898-ec3176fa34c6Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ5make_windy_gridworld (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•è=Áµ°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$ec285c96-4a75-4af6-8898-ec3176fa34c6¹depends_on_disabled_cellsÂ§runtimeÎŸm!µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$cafedde8-be94-4697-a511-510a5fea0155Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚj„ ¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚšXSi°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$cafedde8-be94-4697-a511-510a5fea0155¹depends_on_disabled_cellsÂ§runtimeÏÝHJºµpublished_object_keys‘Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/6021fa627daa4cd3¸depends_on_skipped_cellsÂ§erroredÂÙ$d526a3a4-63cc-4f94-8f55-98c9a4a9d134Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ2double_q_learning (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚšÒPã°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$d526a3a4-63cc-4f94-8f55-98c9a4a9d134¹depends_on_disabled_cellsÂ§runtimeÎ‹-¼µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$02f34da1-551f-4ce5-a588-7f3a14afd716Š¦queuedÂ¤logs§runningÂ¦output†¤body…¦prefix¥Int64¨elements“’’¢-1ªtext/plain’’¡0ªtext/plain’’¡1ªtext/plain¤type¥Array¬prefix_short ¨objectid¯7394916db5e0e55¤mimeÙ!application/vnd.pluto.tree+object¬rootassignee®const wind_var²last_run_timestampËAÚ•èÂ#°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$02f34da1-551f-4ce5-a588-7f3a14afd716¹depends_on_disabled_cellsÂ§runtimeÎHaµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$f11dca8f-5557-49fc-9720-35034eadba57Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚ

Consider a square gridworld in which the rewards for each step are -1.2 or 1.0 with equal probability. There is no wind and the allowed moves are just up, down, left, and right. The start is the lower left corner and the finish is the upper right corner. It is obvious that the expected reward for a step is -0.1, so the optimal policy is to move to the goal as quickly as possible which will take $(l-1) \times 2$ steps. For a 3x3 grid, this would be 4 steps, so $\mathbb{E} \{ G_0 \} = 4 \times -0.1 = -0.4$.

Because the positive reward is so much larger than the expected value, we might expect a large maximization bias to confuse the training method and favor long episodes with expected values that are positive. Below are example solutions after thousands of episodes for each of the previously discussed methods. The first solution shown is the correct optimal policy and value function using value iteration

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ¾fZ°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$f11dca8f-5557-49fc-9720-35034eadba57¹depends_on_disabled_cellsÂ§runtimeÎµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$4ddc7d99-0b79-4689-bd93-8798b105c0a2Š¦queuedÂ¤logs§runningÂ¦output†¤body…¦prefixÙáMDP_TD{GridworldState, GridworldAction, var"#tr#115"{var"#110#119", var"#step#114"{typeof(stochastic_wind), Vector{Int64}, var"#boundstate#113"{Int64, Int64}}}, var"#108#117"{GridworldState}, var"#isterm#116"{GridworldState}}¨elements—’¦states’…¦prefixÙ$Main.var"workspace#3".GridworldState¨elements›’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°78e123e4d06443c5Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡2ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°e3e6b18864c38362Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡3ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°7d75a915b81b9730Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡4ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°32586272439d3588Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡5ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°593769200b7ddf14Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡6ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°d7705072ebc67732Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡7ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid¯32fa797472e0a83Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡2ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°ef30e57ae60bdc38Ù!application/vnd.pluto.tree+object’ ’…¦prefix®GridworldState¨elements’’¡x’¡2ªtext/plain’¡y’¡2ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°74f49756a2864a57Ù!application/vnd.pluto.tree+object¤more’F’…¦prefix®GridworldState¨elements’’¡x’¢10ªtext/plain’¡y’¡7ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°91d5970141de4b2dÙ!application/vnd.pluto.tree+object¤type¥Array¬prefix_short ¨objectid°d8028ea24f24d35dÙ!application/vnd.pluto.tree+object’«statelookup’…¦prefixÙ1Dict{Main.var"workspace#3".GridworldState, Int64}¨elements›’’…¦prefix®GridworldState¨elements’’¡x’¡8ªtext/plain’¡y’¡5ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°14e5eae9a48c6749Ù!application/vnd.pluto.tree+object’¢54ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡6ªtext/plain’¡y’¡4ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°4e5052ac2b36c8beÙ!application/vnd.pluto.tree+object’¢39ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡7ªtext/plain’¡y’¡4ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°6d65389daed97014Ù!application/vnd.pluto.tree+object’¢46ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡8ªtext/plain’¡y’¡4ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°b85af438304886c5Ù!application/vnd.pluto.tree+object’¢53ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¢10ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°dad6dff35c9621ffÙ!application/vnd.pluto.tree+object’¢64ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡6ªtext/plain’¡y’¡7ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°4e4b90239eb3be65Ù!application/vnd.pluto.tree+object’¢42ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡8ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°6d43cd1ca99a553eÙ!application/vnd.pluto.tree+object’¢50ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡2ªtext/plain’¡y’¡3ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°166e372c47e8ffa6Ù!application/vnd.pluto.tree+object’¢10ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡5ªtext/plain’¡y’¡3ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°f8402269233868c7Ù!application/vnd.pluto.tree+object’¢31ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡8ªtext/plain’¡y’¡7ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°b08053c76dcd8072Ù!application/vnd.pluto.tree+object’¢56ªtext/plain¤more¤type¤Dict¬prefix_short¤Dict¨objectid°f2c827ab8104601fÙ!application/vnd.pluto.tree+object’§actions’…¦prefixÙ%Main.var"workspace#3".GridworldAction¨elements˜’’…¦prefix¢Up¨elements¤type¦struct¬prefix_short¢Up¨objectid°ffffffff92511601Ù!application/vnd.pluto.tree+object’’…¦prefix¤Down¨elements¤type¦struct¬prefix_short¤Down¨objectid°ffffffff0a19a748Ù!application/vnd.pluto.tree+object’’…¦prefix¤Left¨elements¤type¦struct¬prefix_short¤Left¨objectid°ffffffff13951cc2Ù!application/vnd.pluto.tree+object’’…¦prefix¥Right¨elements¤type¦struct¬prefix_short¥Right¨objectid°ffffffffa8d2f4c6Ù!application/vnd.pluto.tree+object’’…¦prefix§UpRight¨elements¤type¦struct¬prefix_short§UpRight¨objectid°ffffffff65dca132Ù!application/vnd.pluto.tree+object’’…¦prefix¦UpLeft¨elements¤type¦struct¬prefix_short¦UpLeft¨objectid°ffffffff68f3503eÙ!application/vnd.pluto.tree+object’’…¦prefix©DownRight¨elements¤type¦struct¬prefix_short©DownRight¨objectid°ffffffff97f641f9Ù!application/vnd.pluto.tree+object’’…¦prefix¨DownLeft¨elements¤type¦struct¬prefix_short¨DownLeft¨objectid°ffffffffd243dd41Ù!application/vnd.pluto.tree+object¤type¥Array¬prefix_short ¨objectid°d84fdc99910d1e41Ù!application/vnd.pluto.tree+object’¬actionlookup’…¦prefixÙ2Dict{Main.var"workspace#3".GridworldAction, Int64}¨elements˜’’…¦prefix¤Down¨elements¤type¦struct¬prefix_short¤Down¨objectid°ffffffff0a19a748Ù!application/vnd.pluto.tree+object’¡2ªtext/plain’’…¦prefix¤Left¨elements¤type¦struct¬prefix_short¤Left¨objectid°ffffffff13951cc2Ù!application/vnd.pluto.tree+object’¡3ªtext/plain’’…¦prefix§UpRight¨elements¤type¦struct¬prefix_short§UpRight¨objectid°ffffffff65dca132Ù!application/vnd.pluto.tree+object’¡5ªtext/plain’’…¦prefix©DownRight¨elements¤type¦struct¬prefix_short©DownRight¨objectid°ffffffff97f641f9Ù!application/vnd.pluto.tree+object’¡7ªtext/plain’’…¦prefix¨DownLeft¨elements¤type¦struct¬prefix_short¨DownLeft¨objectid°ffffffffd243dd41Ù!application/vnd.pluto.tree+object’¡8ªtext/plain’’…¦prefix¥Right¨elements¤type¦struct¬prefix_short¥Right¨objectid°ffffffffa8d2f4c6Ù!application/vnd.pluto.tree+object’¡4ªtext/plain’’…¦prefix¦UpLeft¨elements¤type¦struct¬prefix_short¦UpLeft¨objectid°ffffffff68f3503eÙ!application/vnd.pluto.tree+object’¡6ªtext/plain’’…¦prefix¢Up¨elements¤type¦struct¬prefix_short¢Up¨objectid°ffffffff92511601Ù!application/vnd.pluto.tree+object’¡1ªtext/plain¤type¤Dict¬prefix_short¤Dict¨objectid°c905ef492d7feaa3Ù!application/vnd.pluto.tree+object’ªstate_init’Ù%#108 (generic function with 1 method)ªtext/plain’¤step’Ú (::Main.var"workspace#3".var"#tr#115"{Main.var"workspace#3".var"#110#119", Main.var"workspace#3".var"#step#114"{typeof(Main.var"workspace#3".stochastic_wind), Vector{Int64}, Main.var"workspace#3".var"#boundstate#113"{Int64, Int64}}}) (generic function with 1 method)ªtext/plain’¦isterm’Ùq(::Main.var"workspace#3".var"#isterm#116"{Main.var"workspace#3".GridworldState}) (generic function with 1 method)ªtext/plain¤type¦struct¬prefix_short¦MDP_TD¨objectid°46bc7f640018fe80¤mimeÙ!application/vnd.pluto.tree+object¬rootassigneeºconst stochastic_gridworld²last_run_timestampËAÚ•èÐÕ‘°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$4ddc7d99-0b79-4689-bd93-8798b105c0a2¹depends_on_disabled_cellsÂ§runtimeÎr–ˆµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$bd1029f9-d6a8-4c68-98cd-8af94297b521Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ+plot_path (generic function with 2 methods)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•èÕ¿°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$bd1029f9-d6a8-4c68-98cd-8af94297b521¹depends_on_disabled_cellsÂ§runtimeÎèQµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$cb07a6a5-c50a-4900-9e5b-a17dc7ee5710Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ4make_greedy_policy! (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•çÝ8°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$cb07a6a5-c50a-4900-9e5b-a17dc7ee5710¹depends_on_disabled_cellsÂ§runtimeÎ#ÿ¼µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$ddf3bb61-16c9-48c4-95d4-263260309762Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ-exercise_6_5 (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•æŒ”¯°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$ddf3bb61-16c9-48c4-95d4-263260309762¹depends_on_disabled_cellsÂ§runtimeÎ” Qµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$d7566d1b-8938-4e2c-8c54-124f790e72aeŠ¦queuedÂ¤logs§runningÂ¦output†¤body©FiniteMDP¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚš@}°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$d7566d1b-8938-4e2c-8c54-124f790e72ae¹depends_on_disabled_cellsÂ§runtimeÎPÙµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$42799973-9884-4a0e-b29a-039890e92d21Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚ %

Exercise 6.13

What are the update equations for Double Expected Sarsa with an Ïµ-greedy target policy?

For Q-learning the action-value update equation is:

$$Q(S_t, A_t) = Q(S_t, A_t) + \alpha[R_{t+1} + \gamma \text{max}_a Q(S_{t+1}, a) - Q(S_t, A_t)]$$

For expected Sarsa the action-value update equation is:

$$Q(S_t, A_t) = Q(S_t, A_t) + \alpha [ R_{t+1} + \gamma \sum_a \pi(a|S_{t+1})Q(S_{t+1}, a) - Q(S_t, A_t)]$$

For double Q-learning, the twin action-value update equations are:

$$Q_1(S_t, A_t) = Q_1(S_t, A_t) + \alpha [ R_{t+1} + \gamma Q_2(S_{t+1}, \text{argmax}_a Q_1(S_{t+1}, a)) - Q_1(S_t, A_t)]$$

$$Q_2(S_t, A_t) = Q_2(S_t, A_t) + \alpha [ R_{t+1} + \gamma Q_1(S_{t+1}, \text{argmax}_a Q_2(S_{t+1}, a)) - Q_2(S_t, A_t)]$$

For double expected sarsa, we have two action-value estimates like in Double Q-learining, but the bootstrap calculation is an expected value calculation using each value function's target policy. In this case that target is the $\epsilon$-greedy policy rather than the greedy policy in Q-learning. The expected value uses the probabilities from the matching value function but the values from the other one:

With 50% probability:

$$Q_1(S_t, A_t) = Q_1(S_t, A_t) + \alpha [ R_{t+1} + \gamma \sum_a \pi_1(a|S_{t+1}) Q_2(S_{t+1}, a) - Q_1(S_t, A_t)]$$

and make $\pi_1$ $\epsilon$-greedy with respect to $Q_1$

With 50% probability:

$$Q_2(S_t, A_t) = Q_2(S_t, A_t) + \alpha [ R_{t+1} + \gamma \sum_a \pi_2(a|S_{t+1}) Q_1(S_{t+1}, a) - Q_2(S_t, A_t)]$$

and make $\pi_2$ $\epsilon$-greedy with respect to $Q_2$

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ¾Òž°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$42799973-9884-4a0e-b29a-039890e92d21¹depends_on_disabled_cellsÂ§runtimeÎèœµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$187fc682-2282-46ca-b988-c9de438f36fdŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÚ=

Batch Training of Random Walk Task


$\alpha$	0.01
Number of States	5
Maximum Episodes	100

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•æöSP°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$187fc682-2282-46ca-b988-c9de438f36fd¹depends_on_disabled_cellsÂ§runtimeÎ,^îÒµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$8fe856ec-5f0a-4483-bb7d-3f6fe270b6f3Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙB

Example 6.8: Noisy Gridworld

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ¾G°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$8fe856ec-5f0a-4483-bb7d-3f6fe270b6f3¹depends_on_disabled_cellsÂ§runtimeÎÙÏµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$8e15f4b5-0dc7-47a5-9477-9f4d8807b331Š¦queuedÂ¤logs§runningÂ¦output†¤body…¦prefixÙ3FiniteMDP{Float32, GridworldState, GridworldAction}¨elements™’¦states’…¦prefixÙ$Main.var"workspace#3".GridworldState¨elements›’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°78e123e4d06443c5Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡2ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°e3e6b18864c38362Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡3ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°7d75a915b81b9730Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡4ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°32586272439d3588Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡5ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°593769200b7ddf14Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡6ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°d7705072ebc67732Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡7ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid¯32fa797472e0a83Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡2ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°ef30e57ae60bdc38Ù!application/vnd.pluto.tree+object’ ’…¦prefix®GridworldState¨elements’’¡x’¡2ªtext/plain’¡y’¡2ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°74f49756a2864a57Ù!application/vnd.pluto.tree+object¤more’F’…¦prefix®GridworldState¨elements’’¡x’¢10ªtext/plain’¡y’¡7ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°91d5970141de4b2dÙ!application/vnd.pluto.tree+object¤type¥Array¬prefix_short ¨objectid°e873767f6f57e41eÙ!application/vnd.pluto.tree+object’§actions’…¦prefixÙ%Main.var"workspace#3".GridworldAction¨elements˜’’…¦prefix¢Up¨elements¤type¦struct¬prefix_short¢Up¨objectid°ffffffff92511601Ù!application/vnd.pluto.tree+object’’…¦prefix¤Down¨elements¤type¦struct¬prefix_short¤Down¨objectid°ffffffff0a19a748Ù!application/vnd.pluto.tree+object’’…¦prefix¤Left¨elements¤type¦struct¬prefix_short¤Left¨objectid°ffffffff13951cc2Ù!application/vnd.pluto.tree+object’’…¦prefix¥Right¨elements¤type¦struct¬prefix_short¥Right¨objectid°ffffffffa8d2f4c6Ù!application/vnd.pluto.tree+object’’…¦prefix§UpRight¨elements¤type¦struct¬prefix_short§UpRight¨objectid°ffffffff65dca132Ù!application/vnd.pluto.tree+object’’…¦prefix¦UpLeft¨elements¤type¦struct¬prefix_short¦UpLeft¨objectid°ffffffff68f3503eÙ!application/vnd.pluto.tree+object’’…¦prefix©DownRight¨elements¤type¦struct¬prefix_short©DownRight¨objectid°ffffffff97f641f9Ù!application/vnd.pluto.tree+object’’…¦prefix¨DownLeft¨elements¤type¦struct¬prefix_short¨DownLeft¨objectid°ffffffffd243dd41Ù!application/vnd.pluto.tree+object¤type¥Array¬prefix_short ¨objectid°d84fdc99910d1e41Ù!application/vnd.pluto.tree+object’§rewards’…¦prefix§Float32¨elements’’’£0.0ªtext/plain’’¤-1.0ªtext/plain¤type¥Array¬prefix_short ¨objectid°589d78fbcf524589Ù!application/vnd.pluto.tree+object’£ptf’ÚQ70Ã—2Ã—8Ã—70 Array{Float32, 4}: [:, :, 1, 1] = 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 2, 1] = 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 3, 1] = 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 4, 1] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 5, 1] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 6, 1] = 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 7, 1] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 8, 1] = 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 1, 2] = 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 2, 2] = 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 3, 2] = 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 4, 2] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 5, 2] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 6, 2] = 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 7, 2] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 8, 2] = 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 1, 3] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 2, 3] = 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 3, 3] = 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 4, 3] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 5, 3] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 6, 3] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 7, 3] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 8, 3] = 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ;;;; â€¦ [:, :, 1, 68] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 [:, :, 2, 68] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 3, 68] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 4, 68] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 [:, :, 5, 68] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 [:, :, 6, 68] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 7, 68] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 8, 68] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 1, 69] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 [:, :, 2, 69] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 [:, :, 3, 69] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 4, 69] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 [:, :, 5, 69] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 [:, :, 6, 69] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 7, 69] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 [:, :, 8, 69] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 1, 70] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 [:, :, 2, 70] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 [:, :, 3, 70] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 4, 70] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 [:, :, 5, 70] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 [:, :, 6, 70] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 7, 70] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 [:, :, 8, 70] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0ªtext/plain’®action_scratch’…¦prefix§Float32¨elements˜’’¤-1.2ªtext/plain’’£0.0ªtext/plain’’§3.0f-45ªtext/plain’’£0.0ªtext/plain’’§1.0f-45ªtext/plain’’£0.0ªtext/plain’’§3.0f-45ªtext/plain’’£0.0ªtext/plain¤type¥Array¬prefix_short ¨objectid°dfeee36c5df5172cÙ!application/vnd.pluto.tree+object’state_scratch’…¦prefix§Float32¨elements›’’©6.726f-42ªtext/plain’’¨2.69f-43ªtext/plain’’©6.726f-42ªtext/plain’’¨2.69f-43ªtext/plain’’©9.596f-42ªtext/plain’’¨2.69f-43ªtext/plain’’©9.596f-42ªtext/plain’’©6.726f-42ªtext/plain’ ’©6.771f-42ªtext/plain¤more’G’¨9.42f-43ªtext/plain¤type¥Array¬prefix_short ¨objectid°78eef2780ff5340dÙ!application/vnd.pluto.tree+object’®reward_scratch’…¦prefix§Float32¨elements’’’§4.0f-45ªtext/plain’’£0.0ªtext/plain¤type¥Array¬prefix_short ¨objectid°be420ce59a9b6a13Ù!application/vnd.pluto.tree+object’«state_index’…¦prefixÙ1Dict{Main.var"workspace#3".GridworldState, Int64}¨elements›’’…¦prefix®GridworldState¨elements’’¡x’¡8ªtext/plain’¡y’¡5ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°14e5eae9a48c6749Ù!application/vnd.pluto.tree+object’¢54ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡6ªtext/plain’¡y’¡4ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°4e5052ac2b36c8beÙ!application/vnd.pluto.tree+object’¢39ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡7ªtext/plain’¡y’¡4ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°6d65389daed97014Ù!application/vnd.pluto.tree+object’¢46ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡8ªtext/plain’¡y’¡4ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°b85af438304886c5Ù!application/vnd.pluto.tree+object’¢53ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¢10ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°dad6dff35c9621ffÙ!application/vnd.pluto.tree+object’¢64ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡6ªtext/plain’¡y’¡7ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°4e4b90239eb3be65Ù!application/vnd.pluto.tree+object’¢42ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡8ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°6d43cd1ca99a553eÙ!application/vnd.pluto.tree+object’¢50ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡2ªtext/plain’¡y’¡3ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°166e372c47e8ffa6Ù!application/vnd.pluto.tree+object’¢10ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡5ªtext/plain’¡y’¡3ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°f8402269233868c7Ù!application/vnd.pluto.tree+object’¢31ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡8ªtext/plain’¡y’¡7ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°b08053c76dcd8072Ù!application/vnd.pluto.tree+object’¢56ªtext/plain¤more¤type¤Dict¬prefix_short¤Dict¨objectid°5965d00bcc2d1e1dÙ!application/vnd.pluto.tree+object’¬action_index’…¦prefixÙ2Dict{Main.var"workspace#3".GridworldAction, Int64}¨elements˜’’…¦prefix¤Down¨elements¤type¦struct¬prefix_short¤Down¨objectid°ffffffff0a19a748Ù!application/vnd.pluto.tree+object’¡2ªtext/plain’’…¦prefix¤Left¨elements¤type¦struct¬prefix_short¤Left¨objectid°ffffffff13951cc2Ù!application/vnd.pluto.tree+object’¡3ªtext/plain’’…¦prefix§UpRight¨elements¤type¦struct¬prefix_short§UpRight¨objectid°ffffffff65dca132Ù!application/vnd.pluto.tree+object’¡5ªtext/plain’’…¦prefix©DownRight¨elements¤type¦struct¬prefix_short©DownRight¨objectid°ffffffff97f641f9Ù!application/vnd.pluto.tree+object’¡7ªtext/plain’’…¦prefix¨DownLeft¨elements¤type¦struct¬prefix_short¨DownLeft¨objectid°ffffffffd243dd41Ù!application/vnd.pluto.tree+object’¡8ªtext/plain’’…¦prefix¥Right¨elements¤type¦struct¬prefix_short¥Right¨objectid°ffffffffa8d2f4c6Ù!application/vnd.pluto.tree+object’¡4ªtext/plain’’…¦prefix¦UpLeft¨elements¤type¦struct¬prefix_short¦UpLeft¨objectid°ffffffff68f3503eÙ!application/vnd.pluto.tree+object’¡6ªtext/plain’’…¦prefix¢Up¨elements¤type¦struct¬prefix_short¢Up¨objectid°ffffffff92511601Ù!application/vnd.pluto.tree+object’¡1ªtext/plain¤type¤Dict¬prefix_short¤Dict¨objectid°cadd71e6af1c650dÙ!application/vnd.pluto.tree+object¤type¦struct¬prefix_short©FiniteMDP¨objectid°5729813aff5969a5¤mimeÙ!application/vnd.pluto.tree+object¬rootassigneeÙ!const stochastic_gridworld_mdp_dp²last_run_timestampËAÚš—ûü°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$8e15f4b5-0dc7-47a5-9477-9f4d8807b331¹depends_on_disabled_cellsÂ§runtimeÎedµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$9d01c0ef-6313-4091-b444-3e9765aba90cŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÙO

Windy Gridworld Solutions with Q-Learning

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ»8v°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$9d01c0ef-6313-4091-b444-3e9765aba90c¹depends_on_disabled_cellsÂ§runtimeÎÞ0µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$62a9a36a-bedb-4f5a-80a4-2d4111a65c12Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚ2

$$\cdots \:$$

$$S_t$$

$$A_t$$

$$R_{t+1}$$

$$S_{t+1}$$

$$A_{t+1}$$

$$R_{t+2}$$

$$S_{t+2}$$

$$A_{t+2}$$

$$R_{t+3}$$

$$S_{t+3}$$

$$\:\cdots$$

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•çÏ/°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$62a9a36a-bedb-4f5a-80a4-2d4111a65c12¹depends_on_disabled_cellsÂ§runtimeÎ*´ÊÐµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$2651af2d-56a8-4f7e-a56a-45cabd665c72Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚj„ ¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚšŠ°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$2651af2d-56a8-4f7e-a56a-45cabd665c72¹depends_on_disabled_cellsÂ§runtimeÎàL˜µpublished_object_keys‘Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/a68a31a7f0a83bf4¸depends_on_skipped_cellsÂ§erroredÂÙ$620a6426-cb29-4010-997b-aa4f9d5f8fb0Š¦queuedÂ¤logs§runningÂ¦output†¤body ¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•æö™ö°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$620a6426-cb29-4010-997b-aa4f9d5f8fb0¹depends_on_disabled_cellsÂ§runtimeÎ2%µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$889611fb-7dac-4769-9251-9a90e3a1422fŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ+statestyle (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•åÚ±°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$889611fb-7dac-4769-9251-9a90e3a1422f¹depends_on_disabled_cellsÂ§runtimeÎ{Êµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$5455fc97-55cb-4b0e-a3be-9433ccc96fc0Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚ^

Number of States: 5

Animation Interval (s): 0.5

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•å*v\°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$5455fc97-55cb-4b0e-a3be-9433ccc96fc0¹depends_on_disabled_cellsÂ§runtimeÎ ¢¤gµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$24a441c8-7aaf-4642-b245-5e1201456d67Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ-check_policy (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•â&Ò-°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$24a441c8-7aaf-4642-b245-5e1201456d67¹depends_on_disabled_cellsÂ§runtimeÎ µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$1e45a661-c2e1-40c2-b27b-5f80f95efdabŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÛ»Ù

Actions

Wind Values

-11.0

-12.0

-11.0

-12.0

-13.0

-11.0

-9.9

-11.0

-12.0

-13.0

-9.5

-8.8

-10.0

-12.0

-13.0

-7.7

-8.6

-11.0

-12.0

-13.0

-14.0

-6.4

-6.8

-10.0

-11.0

-13.0

-14.0

-4.8

-5.6

-6.5

-10.0

-11.0

-13.0

-4.4

-2.6

-5.2

-8.4

-12.0

-6.0

-5.0

-8.1

0.0

-9.8

-11.0

-10.0

-6.7

-5.8

-5.1

-7.7

-7.6

-8.1

-9.1

-6.9

-7.3

-7.0

-5.5

-6.1

-7.1

-8.3

Actions

Wind Values

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•éÂi°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$1e45a661-c2e1-40c2-b27b-5f80f95efdab¹depends_on_disabled_cellsÂ§runtimeÎÐíµpublished_object_keys‘Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/2b03c89d05785d10¸depends_on_skipped_cellsÂ§erroredÂÙ$21fbdc3b-4444-4f56-9934-fb58e184d685Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ–

Load existing figure:

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•éâ{À°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$21fbdc3b-4444-4f56-9934-fb58e184d685¹depends_on_disabled_cellsÂ§runtimeÎbÚdµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$30e663da-282c-42ff-8171-dbe3c5c467c6Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ5makepolicyvalueplots (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚšŽO¾°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$30e663da-282c-42ff-8171-dbe3c5c467c6¹depends_on_disabled_cellsÂ§runtimeÎ*Ð\µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$9651f823-e1cd-4e6e-9ce0-be9ea1c3f0a4Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ4display_king_policy (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•èv°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$9651f823-e1cd-4e6e-9ce0-be9ea1c3f0a4¹depends_on_disabled_cellsÂ§runtimeÎ*Ùµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$84a71bf8-0d66-42cd-ac7b-589d63a16edaŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ5create_greedy_policy (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚš^d=°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$84a71bf8-0d66-42cd-ac7b-589d63a16eda¹depends_on_disabled_cellsÂ§runtimeÎ%hÓµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$c9f7646a-ec01-4d90-9215-5027b7c1c885Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚ´

Q-learning Instability at Higher Learning Rate

Learning Rate $\alpha$ 0.3

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚšµ°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$c9f7646a-ec01-4d90-9215-5027b7c1c885¹depends_on_disabled_cellsÂ§runtimeÎžûµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$8e34202a-f841-4464-9017-cd50194f7987Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ3make_random_policy (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•â ZÕ°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$8e34202a-f841-4464-9017-cd50194f7987¹depends_on_disabled_cellsÂ§runtimeÎ`µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$95245673-2c29-401e-bb4b-a39dc8172297Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ5create_gridworld_mdp (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚš2!q°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$95245673-2c29-401e-bb4b-a39dc8172297¹depends_on_disabled_cellsÂ§runtimeÎ7¢Nµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$c34678f6-53bb-4f2a-96f0-a7b16f894dddŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ

Value Iteration Solution

Actions

Wind Values

-0.4

-0.3

-0.2

-0.3

-0.2

-0.1

-0.2

-0.1

0.0

Actions

Wind Values

Sarsa Solution

Actions

Wind Values

0.039

-0.76

-0.096

0.096

-0.52

0.019

-0.83

-0.36

0.0

Actions

Wind Values

Expected Sarsa Solution

Actions

Wind Values

-0.68

-0.63

-0.23

-0.44

-0.46

0.058

-0.36

-0.35

0.0

Actions

Wind Values

Double Expected Sarsa Solution

Actions

Wind Values

-0.85

-0.71

-0.65

-0.85

-0.45

-0.58

-0.39

-0.0088

0.0

Actions

Wind Values

Q-learning Solution

Actions

Wind Values

-0.82

-0.57

-0.99

-0.77

-0.037

-0.39

-0.56

-0.42

0.0

Actions

Wind Values

Double Q-learning Solution

Actions

Wind Values

-0.79

-0.63

-0.26

-0.39

-0.24

-0.19

-0.38

-0.25

0.0

Actions

Wind Values

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚšç‚°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$e4e80015-40ce-4f8a-aac7-4a9584da4baa¹depends_on_disabled_cellsÂ§runtimeÏT™c!µpublished_object_keys—Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/72ba1d0790a4c524Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/3f5340d82e7339daÙ59c6be96e-38f7-11f0-2d30-a71f02755abc/4cf46394be540b73Ù49c6be96e-38f7-11f0-2d30-a71f02755abc/d3a9386ca62c618Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/f97aed3be1675ad6Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/93bf178085e446c5Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/895c5d874ea742c3¸depends_on_skipped_cellsÂ§erroredÂÙ$64fe8336-d1c2-41fe-a522-1b6f63260fc9Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ31Ã—6 Matrix{Float32}: 1.0 1.0 1.0 1.0 1.0 1.0¤mimeªtext/plain¬rootassignee¬const Ï€_mrp²last_run_timestampËAÚ•å„Ð°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$64fe8336-d1c2-41fe-a522-1b6f63260fc9¹depends_on_disabled_cellsÂ§runtimeÎp:Oµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$dea61907-d4fb-492d-b2bb-c037c7f785cbŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ8bellman_optimal_value! (generic function with 2 methods)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚš˜„ä°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$dea61907-d4fb-492d-b2bb-c037c7f785cb¹depends_on_disabled_cellsÂ§runtimeÎÀ§µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$678cad7a-1abb-4fcc-91ba-b5abcbb914cbŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ1show_grid_value (generic function with 2 methods)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•èØ¶Ø°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$678cad7a-1abb-4fcc-91ba-b5abcbb914cb¹depends_on_disabled_cellsÂ§runtimeÎpkµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$d299d800-a64e-4ba2-9603-efa833343405Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ,example_6_5 (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚš¦>¸°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$d299d800-a64e-4ba2-9603-efa833343405¹depends_on_disabled_cellsÂ§runtimeÎp=µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$c5718459-2323-4615-b2c4-f92a0fa189d9Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚ

Let $\mathcal{M}$ be the set of labels of estimators that maximize the expcted values of $X$:

$$\mathcal{M} \doteq \left \{ j \mid \mathbb{E} \{ X_j \} = \max_i \mathbb{E} \{ X_i \} \right \}$$

Let $Max(S)$ be the set of labels of estimators that yield the maximum estimate for some set of samples S:

$$Max(S) \doteq \left \{ j \mid \mu_j(S) = \max_i \mu_i(S) \right \}$$

The claim is that for all $j \in \mathcal{M}$

$$\mathbb{E} \{ \max_i \mu_i \} \geq \mathbb{E} \{ \mu_j \} = \mathbb{E} \{ X_j \} \doteq \max_i \mathbb{E} \{ X_i \} \tag{d}$$

Proof. Assume $j \in \mathcal{M}$, i.e. $\mu_j$ is any estimator whose expected value is the maximal. Then

$$\begin{flalign} \mathbb{E} \{ \max_i \mu_i \} &= P(j \in Max) \mathbb{E} \{ \max_i \mu_i \} + P(j \notin Max) \mathbb{E} \{ \max_i \mu_i \} \\ &= P(j \in Max) \mathbb{E} \{\mu_j \vert j \in Max \} + P(j \notin Max) \mathbb{E} \{ \max_i \mu_i \} \\ &\geq P(j \in Max) \mathbb{E} \{\mu_j \vert j \in Max \} + P(j \notin Max) \mathbb{E} \{ \mu_j \vert j \notin Max \} \\ &=\mathbb{E} \{ \mu_j \} = \mathbb{E} \{X_j\} \doteq \max_i \mathbb{E} \{ X_i \} \end{flalign}$$

The third line in the proof follows from the definition of $Max$ which implies $\mathbb{E} \{ \max_i \mu_i \} \gt \mathbb{E} \{ \mu_j \vert j \notin Max \}$, for any $j$. Therefore the inequality is strict if and only if $P(j \notin Max) \gt 0$, for some $j \in \mathcal{M}$. If we do not know whether this is the case, we do not know if the inequality in $(d)$ is strict and theremore in general we write $\mathbb{E} \{ \max_i \mu_i \} \geq \max_i \mathbb{E} \{ \mu_i \}$ so the claim has been proven.

Recall that $j$ is assumed to be in the set $\mathcal{M}$ meaning it has a maximizing expected value while the set $Max(S)$ contains the variables that produce the maximum estimate over some sample $S$. So, intuitively, the proof says that calculating the expected value of the maximum of the estimators will always have a positive bias, unless there is 0 probability that the variables that produces the highest estimates over a given sample are different than the true set of maximizing variables. This means that unless the underlying distribution of the variables have zero overlap (in this case the ranking of estimates will match the ranking of true expected values), there is always an expected positive bias.

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ½a™°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$c5718459-2323-4615-b2c4-f92a0fa189d9¹depends_on_disabled_cellsÂ§runtimeÎ Æçµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$c306867b-f137-44f2-97dd-3d10c226ca5cŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÚ¤

Consider instead policy improvement with afterstate value estimates $W_\pi(y)$ where we seek to choose a policy that is greedy with respect to the afterstate values:

$$\pi^\prime(s) = \mathrm{argmax}_a (f_2(s, a) + W_\pi(f_1(s, a))$$

where $f_1$ and $f_2$ are the deterministic functions defined above that determine which afterstate is reached from $(s, a)$ and whether any intermediate reward is received. This looks much closer to the policy improvement that occurs with $Q(s, a)$ and that is because $Q_\pi(s, a) = f_2(s, a) + W_\pi(f_1(s, a))$. So, if we use afterstates, we can have the benefits of learning the state action value function while only saving values for the afterstates. The functions $f_1$ and $f_2$ provide all the extra information needed to recover those values.

Continuing the comparison to value iteration, recall that we adapted the Bellman optimality equation for the state value function to have a single update rule to estimate $V^*(s)$:

$$V^*(s) = \max_a Q^*(s, a) = \max_a \sum_{r, s^\prime} p(r, s^\prime \vert s, a) (r + \gamma V^*(s^\prime))$$

We can only apply this update rule if we have $p(r, s^\prime \vert s, a)$ or if we instead estimate $Q^*$ and sample the transitions from the environment. To estimate $W^*(y)$, we need to represent the Bellman optimality equation for the afterstate value function instead of the state value function:

$$\begin{flalign} W^*(y) &= \sum_{r, s^\prime} p(r, s^\prime \vert y)(r + \gamma \max_a(f_2(s^\prime, a) + W^*(f_1(s^\prime, a)))) \\ &= \sum_{r, s^\prime} p(r, s^\prime \vert y)r + \gamma \sum_{s^\prime} p(s^\prime \vert y) \max_a(f_2(s^\prime, a) + W^*(f_1(s^\prime, a))) \end{flalign}$$

where $p(s^\prime \vert y) = \sum_r p(r, s^\prime \vert y)$

The outer sum is just represents an expected value based on the transition out of $y$, so if we don't have access to $p(r, s^\prime \vert y)$, we could sample the transitions from the environment. The $\max_a$ term can now be calculated explicitely and will involve finding the maximum index of a vector for each transition state and does not depend on the reward. Using state values, the maximization step involves evaluating a double sum every time, so each update with afterstates is less costly. Also, the afterstates themselves might be more informative in the sense that they all have distinct values. If many of the actions from a given state, lead to the same afterstate, this method will immediately treat them all as equal, whereas with usual value iterationthat equivalence would have to be calculated with the probability transition function. The benefits of using an afterstate value function depend entirely on how effectively the environment transitions can be separated into informative deterministic steps and limited stochastic dynamics.

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ¿{B°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$c306867b-f137-44f2-97dd-3d10c226ca5c¹depends_on_disabled_cellsÂ§runtimeÎ D:µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$a4c4d5f2-d76d-425e-b8c9-9047fe53c4f0Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÛªS

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚš5ð(°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$a4c4d5f2-d76d-425e-b8c9-9047fe53c4f0¹depends_on_disabled_cellsÂ§runtimeÎ¡äbµpublished_object_keys”Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/bc25cbf31a6c6942Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/5b7c97cc5c268b2eÙ59c6be96e-38f7-11f0-2d30-a71f02755abc/4d752609bc5b03a9Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/6aa5ac91f9de9235¸depends_on_skipped_cellsÂ§erroredÂÙ$410abe1d-04a6-4434-9abf-0d29dd6498e6Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙJ

Tabular TD(0) Implementation

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ°Õ‘°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$410abe1d-04a6-4434-9abf-0d29dd6498e6¹depends_on_disabled_cellsÂ§runtimeÎßIµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$aa0791a5-8cf1-499b-9900-4d0c59be808cŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ0stochastic_wind (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•èÈe|°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$aa0791a5-8cf1-499b-9900-4d0c59be808c¹depends_on_disabled_cellsÂ§runtimeÎ _0µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$510761f6-66c7-4faf-937b-e1422ec829a6Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚ¼ ¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•åò©L°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$510761f6-66c7-4faf-937b-e1422ec829a6¹depends_on_disabled_cellsÂ§runtimeÍ+þµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$0b9c6dbd-4eb3-4167-886e-64db9ec7ff04Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚï

Exercise 6.3

From the results shown in the left graph of the random walk example it appears that the first episode results in a change only in $V(A)$. What does this tell you about what happened on the first episode? Why was only the estimate for this one state changed? By exactly how much was it changed?

The update rule with TD(0) learning is given by

$$V(S_t) \leftarrow V(S_t) + \alpha[R_{t+1} + \gamma V(S_{t+1}) - V(S_t)]$$

All states, A, B, C, D, E are initialized at 0.5 with the terminal state initialized at 0. During the first episode for all transitions before the end, the reward is 0 and the difference between adjacent states would be 0 resulting in no change to the value function. Since the value estimate for state A decreases from the initial value, this means that the first episode terminated to the left. For this final transition we have the following update.

$$V(A) \leftarrow V(A) + \alpha[0 + \gamma V(\text{Term}) - V(A)]$$

We know that prior to the update $V(A) = 0.5$, $V(\text{Term}) = 0$ and $\gamma=1$ so the update is

$$V(A) \leftarrow 0.5 + \alpha[0 - 0.5]$$

For this plot, $\alpha=0.1$, so the updated value for $V(A)$ is $0.5+0.1(-0.5)=0.5-0.05=0.45$

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ·…<°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$0b9c6dbd-4eb3-4167-886e-64db9ec7ff04¹depends_on_disabled_cellsÂ§runtimeÎaaµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$a9dda9b5-f568-481c-9e8f-9bb887468775Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ;

Random Walk MDP Setup

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ·Eá°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$a9dda9b5-f568-481c-9e8f-9bb887468775¹depends_on_disabled_cellsÂ§runtimeÎÀµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$ad03500a-bd42-4216-a9cb-3f923152af79Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙAcreate_car_rental_afterstate_mdp (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚšl'°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$ad03500a-bd42-4216-a9cb-3f923152af79¹depends_on_disabled_cellsÂ§runtimeÎ oµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$de50f95f-984e-4387-958c-64e0265f5953Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ,render_walk (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•å½—p°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$de50f95f-984e-4387-958c-64e0265f5953¹depends_on_disabled_cellsÂ§runtimeÎ/©Øµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$c8500b89-644d-407f-881a-bcbd7da23502Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙõ

Figure 6.3 Interim and aymptotic performance shown for TD control methods on cliff-walking task as a function of Î±. Dashed lines represent interim performance and solid lines are asymptotic.

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ¼Vü°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$c8500b89-644d-407f-881a-bcbd7da23502¹depends_on_disabled_cellsÂ§runtimeÎ{wµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$84d81413-6334-4965-8632-8a763cd3f28aŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÚ8

Comparison of all learning methods with their double estimator counterparts and the simple MDP described in 6.7. Q-learning initially learns to take the left action much more often than the right atcion, and always takes it significantly more often than the 5% minimum probability encorced by $\epsilon$-greedy action selection with $\epsilon$=0.1. In contrast, Double Q-learning is essentially unaffected by maximization bias as is Double Expected Sarsa. Sarsa and Expected Sarsa also exhibit maximization bias as well. All of the sarsa methods eventually take the left action more than Q-learning even though the behavior policy should be the same for both. Even Double Expected Sarsa without maximization bias shows the same tendancy. The only difference between this method and Double Q-learning is the use of the $\epsilon$-greedy policy in the value calculation. So the action value estimates are for the $\epsilon$-greedy policy rather than for the greedy policy under Double Q-learning. Under this policy, sometimes the right action selection goes left and visa versa. Even under the $\epsilon$-greedy policy, the optimal policy would be to select right, but due to the variance in value estimates introduced by $\epsilon$, it will take longer for the behavior policy based on the Q values to converge to the correct values. That slower convergence is apparent in the graph above.

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ¾-º°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$84d81413-6334-4965-8632-8a763cd3f28a¹depends_on_disabled_cellsÂ§runtimeÎwèµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$33d69db9-fa2b-40a3-bbed-21d5fd60f302Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ,example_6_8 (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚšW-°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$33d69db9-fa2b-40a3-bbed-21d5fd60f302¹depends_on_disabled_cellsÂ§runtimeÎ‡%·µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$3f3ebc9b-b070-4d73-8be9-823b399c664cŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ0batch_value_est (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•çf•°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$3f3ebc9b-b070-4d73-8be9-823b399c664c¹depends_on_disabled_cellsÂ§runtimeÎŒ™°µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$d5b612d8-82a1-4586-b721-1baaea2101cfŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÚ7

Value iteration with afterstates converged in 10 fewer steps than state value iteration, but the total runtime is less than 25%. So as expected the afterstate method converges in fewer steps each of which is more efficient to compute than using the state value function.

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ¿Îö°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$d5b612d8-82a1-4586-b721-1baaea2101cf¹depends_on_disabled_cellsÂ§runtimeÎ&Ôµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$dee6b500-0ba1-4bbc-b217-cbb9ad47ad06Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÛS"

Sarsa Solution

Actions

Wind Values

-6.9

-7.4

-8.0

-8.9

-8.3

-8.1

-6.5

-6.9

-7.3

-8.1

-8.3

-8.5

-5.5

-6.3

-5.3

-6.4

-7.8

-8.3

-9.0

-4.6

-4.4

-7.1

-7.6

-8.3

-8.8

-9.6

-4.1

-4.5

-4.4

-7.3

-8.1

-8.7

-9.7

-3.4

-3.3

-2.1

-6.7

-8.0

-8.1

-8.9

-0.99

-1.0

-6.5

-7.2

-7.5

-7.7

0.0

-0.94

0.0

-5.3

-6.5

-6.9

0.0

-1.0

-3.7

-5.2

-5.9

0.0

-0.5

-1.0

-2.0

-3.0

-3.9

-4.9

Actions

Wind Values

Value Iteration Solution

Actions

Wind Values

-7.0

-8.0

-6.0

-7.0

-8.0

-5.0

-6.0

-7.0

-8.0

-4.0

-6.0

-7.0

-8.0

-9.0

-3.0

-7.0

-8.0

-9.0

-2.0

-7.0

-8.0

-1.0

-6.0

-7.0

-1.0

0.0

-5.0

-6.0

-2.0

-1.0

-3.0

-4.0

-5.0

-2.0

-3.0

-4.0

Actions

Wind Values

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚš(…/°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$dee6b500-0ba1-4bbc-b217-cbb9ad47ad06¹depends_on_disabled_cellsÂ§runtimeÎ^•¬µpublished_object_keys”Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/ae6d04b38d0be15fÙ59c6be96e-38f7-11f0-2d30-a71f02755abc/59425f0a62718546Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/ac757a3486dcd2e1Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/a7c05c6ee7bae052¸depends_on_skipped_cellsÂ§erroredÂÙ$897fde24-9a4a-465e-96f2-dd9e8baab294Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÛD

Actions

Wind Values

-14.0

-15.0

-14.0

-13.0

-14.0

-12.0

-13.0

-11.0

-12.0

0.0

-11.0

0.0

-9.8

-10.0

0.0

-8.9

-9.0

0.0

-0.88

0.0

-5.9

-8.0

-0.5

-1.4

-1.0

-5.0

-6.0

-6.9

-7.0

-1.3

-2.0

-3.0

-4.0

-5.0

-6.0

Actions

Wind Values

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•éFÀ°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$897fde24-9a4a-465e-96f2-dd9e8baab294¹depends_on_disabled_cellsÂ§runtimeÎ INµpublished_object_keys‘Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/d6339d133c128c5b¸depends_on_skipped_cellsÂ§erroredÂÙ$1e3d231a-4065-48ce-a74e-018066fb232aŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ,example_6_3 (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•ç1°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$1e3d231a-4065-48ce-a74e-018066fb232a¹depends_on_disabled_cellsÂ§runtimeÎÚöµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$0f22e85f-ed31-49df-a7c7-0579298f05feŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÚl

For Monte Carlo learning each state estimate is updated with the error shown by the red arrows only after the episode is finished. For TD(0) learning, as soon as the feedback from the subsequent state is received, the error can be calculated and it is only based on the new information from one state into the future.

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ¶è`°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$0f22e85f-ed31-49df-a7c7-0579298f05fe¹depends_on_disabled_cellsÂ§runtimeÎMµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$9017093c-a9c3-40ea-a9c6-881ee62fc379Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÛ¼Â

Exercise 6.2

This is an exercise to help develop your intuition about why TD methods are often more efficient than Monte Carlo methods. Consider the driving home example and how it is addressed by TD and Monte Carlo methods. Can you imagine a scenario in which a TD update would be better on average than a Monte Carlo update? Give an example scenario - a description of past experience and a current state - in which you would expect the TD update to be better. Here's a hint: Suppose you have lots of experience driving home from work. Then you move to a new building and a new parking lot (but you still enter the highway at the same place). Now you are starting to learn predictions for the new building. Can you see why TD updates are likely to be much better, at least initially, in this case? Might the same sort of thing happen in the original scenario?

Originally, from the starting state, the expected total time to reach home is 30 minutes. Now if we change the route so that it now takes on average 5 more minutes to reach the car, but the expected elapsed time for every other leg of the journey is unchanged. Now our total time estimate should be 35 minutes from the starting state on average. Let's say we reach the car and nothing out of the ordinary is happening. The predicted time to go will be 25 minutes and the predicted total time will be 35 minutes. If nothing further out of the ordinary occurs, then only the first state will be corrected. For the Monte Carlo method, the only state with an estimate error will be the first state, but this update will not occur until after we've arrived at our destination. Either way, the next time we drive we will have a new, more accurate estimate reflecting the longer time required to reach the car.

In the example, during the drive several events occur during the journey that change the predicted and actual time from the average. For simplicity let's assume that when we enter our home street there is a garbage truck blocking our path. Normally it only takes 3 minutes to arrive at home, but with the truck present we estimate it will take 5 minutes (2 minutes longer). Now the total predicted time will be increased from 35 minutes to 37 minutes. In the case of Monte Carlo learning, this additional 2 minutes will propagate backwards to all of the previous states because we experienced a true travel time of 37 minutes rather than the 35 minutes predicted after the 2nd state and the 30 minutes predicted after the first state. For TD(0) learning, however, this delay will only impact the previous state after a single update. Effectively it will increase the predicted time spent on the final leg of the journey only. The prediction from the starting state will only be increased by the 5 minute increase from the walk to the car, not the delay from the garbage truck. Since we are actually starting from a new point, that feedback will be consistent and does reflect a true change in the expected time from the starting state. The garbage truck, however, may be a rare occurence. By the time this change propagates backwards through the states to the starting state, a lot more experience will be accummulated at all the other states and if Î± is some reasonable value, this delay will not be counted nearly as much as the updates from the first leg of the journey. Since TD(0) only uses feedback from one step into the future immediately, if changes are made to the environment, those changes will only affect the most closely related states immediately. In this example, all of the accurate predictions we still have about the later legs of the journey will be used to keep the predictions more stable.

The opposite extreme though could create a situation where the Monte Carlo updates were better. Imagine instead that you moved houses in the same neighborhood such that once you enter the home street, it takes 5 minutes to reach your home instead of 3 minutes. In this case, the Monte Carlo updates would move all of the state predictions up towards the 2 minute increase since all of the predictions would be too short. The TD(0) update though would initially only increase the prediction for the final leg of the journey and we would have to wait for this change to propagate backwards to all the other states. So the efficiency of updates for each method depends on where in the episode environmental changes occur.

Actual environment change at the end of the route

Now there is a randomly experienced shorter leg at the start of the journey which won't affect most of the Monte Carlo updates.

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•äåÁ‡°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$9017093c-a9c3-40ea-a9c6-881ee62fc379¹depends_on_disabled_cellsÂ§runtimeÎžmVµpublished_object_keys”Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/a1553d03eb644044Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/56740ad756b57fb4Ù49c6be96e-38f7-11f0-2d30-a71f02755abc/d59b9cec3943784Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/9eed5a2466d73029¸depends_on_skipped_cellsÂ§erroredÂÙ$4b0d96d0-25d1-4fed-b105-c65fa2883a61Š¦queuedÂ¤logs§runningÂ¦output†¤body…¦prefixÙKMDP_TD{Int64, Int64, var"#step#28"{Int64}, var"#26#29"{Int64}, var"#27#30"}¨elements—’¦states’…¦prefix¥Int64¨elements–’’¡0ªtext/plain’’¡1ªtext/plain’’¡2ªtext/plain’’¡3ªtext/plain’’¡4ªtext/plain’’¡5ªtext/plain¤type¥Array¬prefix_short ¨objectid°9b8acb7b7f1ff624Ù!application/vnd.pluto.tree+object’«statelookup’…¦prefix²Dict{Int64, Int64}¨elements–’’¡0ªtext/plain’¡1ªtext/plain’’¡4ªtext/plain’¡5ªtext/plain’’¡5ªtext/plain’¡6ªtext/plain’’¡2ªtext/plain’¡3ªtext/plain’’¡3ªtext/plain’¡4ªtext/plain’’¡1ªtext/plain’¡2ªtext/plain¤type¤Dict¬prefix_short¤Dict¨objectid°9af93f222051e2ebÙ!application/vnd.pluto.tree+object’§actions’…¦prefix¥Int64¨elements‘’’¡1ªtext/plain¤type¥Array¬prefix_short ¨objectid°11391f426a7c9ef8Ù!application/vnd.pluto.tree+object’¬actionlookup’…¦prefix²Dict{Int64, Int64}¨elements‘’’¡1ªtext/plain’¡1ªtext/plain¤type¤Dict¬prefix_short¤Dict¨objectid°f725d08c58aa899eÙ!application/vnd.pluto.tree+object’ªstate_init’Ù$#26 (generic function with 1 method)ªtext/plain’¤step’ÙO(::Main.var"workspace#3".var"#step#28"{Int64}) (generic function with 1 method)ªtext/plain’¦isterm’Ù$#27 (generic function with 1 method)ªtext/plain¤type¦struct¬prefix_short¦MDP_TD¨objectid°f635a3c4fb5f8d4a¤mimeÙ!application/vnd.pluto.tree+object¬rootassigneeconst mrp_6_2²last_run_timestampËAÚ•å\›í°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$4b0d96d0-25d1-4fed-b105-c65fa2883a61¹depends_on_disabled_cellsÂ§runtimeÎ–™Ôµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$1115f3ec-f4b2-4fba-bd5e-321a63b10a6dŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÛ»

Actions

Wind Values

-6.0

-6.3

-6.8

-7.0

-6.8

-6.9

-5.7

-5.9

-6.0

-6.8

-7.1

-5.0

-6.0

-6.8

-7.6

-4.0

-5.9

-6.8

-7.3

-8.1

-3.0

-6.0

-6.7

-7.3

-8.4

-2.0

-5.5

-6.6

-7.2

-8.0

-1.0

-5.0

-6.0

-6.7

-7.0

0.0

-0.42

-0.81

0.0

-4.4

-5.5

-6.0

-0.1

-0.29

-0.76

-1.0

-2.7

-4.0

-5.0

-0.2

-0.46

-0.84

-1.4

-2.0

-3.0

-4.0

Actions

Wind Values

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•é‚‚°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$1115f3ec-f4b2-4fba-bd5e-321a63b10a6d¹depends_on_disabled_cellsÂ§runtimeÎ r -µpublished_object_keys‘Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/afbc8d42c8c4fc44¸depends_on_skipped_cellsÂ§erroredÂÙ$1e3b3234-3fe1-46c9-82b7-f729c656eb25Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚ.

$$\begin{flalign} G_t - V_t(S_t) &= \delta_t + \gamma \eta_{t} + \gamma \left [\delta_{t+1} + \gamma \eta_{t+1} + \gamma (G_{t+2} - V_{t+2}(S_{t+2}) ) \right ] \\ &= \delta_t + \gamma \eta_{t} + \gamma \delta_{t+1} + \gamma^2 \eta_{t+1} + \gamma^2 \left [G_{t+2} - V_{t+2}(S_{t+2}) \right ] \\ &= (\delta_t + \gamma \eta_t) + \gamma (\delta_{t+1} + \gamma \eta_{t+1}) + \cdots + \gamma^{T-t-1}(\delta_{T-1} + \gamma \eta_{T-1}) + \gamma^{T-t} \left [G_T - V_T(S_T) \right ]\\ &= (\delta_t + \gamma \eta_t) + \gamma (\delta_{t+1} + \gamma \eta_{t+1}) + \cdots + \gamma^{T-t-1}(\delta_{T-1} + \gamma \eta_{T-1})\\ &=\sum_{k=t}^{T-1} \gamma^{k-t} (\delta_k + \gamma \eta_k)\\ \end{flalign}$$

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ±‚Ø°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$1e3b3234-3fe1-46c9-82b7-f729c656eb25¹depends_on_disabled_cellsÂ§runtimeÎ¹ßµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$6029990b-eb31-45ae-a869-b789fba673a6Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚ

To use afterstates with generalized policy iteration, we need to modify our MDP framework by considering the following trajectory:

$$(S, A) \longrightarrow (Y, P) \longrightarrow (S^\prime, R) \longrightarrow \cdots \longrightarrow (S_T, R_T)$$

where $(S, A, R)$ are the usual state, action, and reward. We introduce $(Y, P)$ to indicate the afterstate and any intermediate reward that is received from the afterstate transition.

The probability transition function for a normal MDP is written as $p(s^\prime, r \vert s, a)$ and represents the probability of transitioning to state $s$ with reward $r$ under the condition that an agent takes action $a$ from state $s$.

When using afterstates, transitions can be represented with two functions:

$$p(y, \rho \vert s, a) \tag{a}$$

is the probability of transitioning to afterstate $y$ with intermediate reward $\rho$ given an agent takes action $a$ from state $s$

$$p(s^\prime, r \vert y) \tag{b}$$

is the probability of transitioning to state $s^\prime$ with reward $r$ given an agent starts in afterstate $y$.

Moreover, when an environment is modified to use afterstates, usually there are known deterministic dynamics that follow actions followed by some stochastic behavior after that. A good example is tic-tac-toe where we fully know the dynamics after making a move, but there could be some unknown behavior from the opponent. In this situation, the afterstate probability transition (a) is deterministic, so it could instead be represented by a mapping function that returns an afterstate and an intermediate reward given a state action pair.

$$f_1(s, a) = y \tag{b1â€²}$$

$$f_2(s, a) = \rho \tag{b2â€²}$$

where $y$ and $\rho$ are the afterstate and reward respectively after taking action $a$ in state $s$. Now all of the stochastic dynamics of the environment are captured in (b) and the function only has 3 arguments instead of the usual 4. We can now apply all of the previous techniques to the afterstate example and even combine dynamic programming and trajectory sampling.

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ¿&p°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$6029990b-eb31-45ae-a869-b789fba673a6¹depends_on_disabled_cellsÂ§runtimeÎ .µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$61bbf9db-49a0-4709-83f4-44f228be09c0Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ&sarsa (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•çùÇ°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$61bbf9db-49a0-4709-83f4-44f228be09c0¹depends_on_disabled_cellsÂ§runtimeÎ¨“µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$814d89be-cfdf-11ec-3295-49a8f302bbcfŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÚv

Chapter 6 Temporal-Difference Learning

TD methods combine the Monte Carlo concept of learning from experience with the self-consistency ideas from dynamic programming. Unlike the pure Monte Carlo methods of Chapter 5, TD methods do not require waiting for the final outcome of an episode to start learning. In other words they bootstrap learning by exploiting what is known about the properties of the value function. Eventually we will see that different degrees of bootstrapping can be used that bridge the gap between the techniques in Chapter 5 and 6.

6.1 TD Prediction

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ°‘S°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$814d89be-cfdf-11ec-3295-49a8f302bbcf¹depends_on_disabled_cellsÂ§runtimeÎÞýµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$52aebb7b-c2a9-443f-bc03-24cd25793b32Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚ

Exercise 6.4

The specific results shown in the right graph of the random walk example are dependent on the value of the step-size parameter $\alpha$. Do you think the conclusions about which algorithm is better would be affected if a wider range of values were used? Is there a different, fixed value of $\alpha$ at which either algorithm would have performed significantly better than shown? Why or why not?

Both algorithms should theoretically converge to the true values with a sufficiently small $\alpha$ and a large enough number of samples. Over this limited window of 100 episodes, an $\alpha$ that is too small might result in convergence so slow that it does not reach error as low as a larger $\alpha$. For the MC method, $\alpha=0.01$ is the smallest value and it has the slowest convergence over this range. $\alpha=0.04$ is the largest value tested, and it results in approximately the same error after 100 episodes. The intermediate values show better performance over this number of episodes indicating that the best possible performance is already captured in this interval.

For the TD method, the best results shown are for $\alpha=0.05$ which is already the smallest value with the slowest convergence rate. An even smaller value might result in a better outcome over 100 episodes, but this performance is already better than anything observed for the MC method.

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ·¥œ°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$52aebb7b-c2a9-443f-bc03-24cd25793b32¹depends_on_disabled_cellsÂ§runtimeÎJÅµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$3d8b1ccd-9bb3-42f2-a77a-6afdb72c1ff8Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ+calc_error (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•æüø+°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$3d8b1ccd-9bb3-42f2-a77a-6afdb72c1ff8¹depends_on_disabled_cellsÂ§runtimeÎ…µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$031e1106-7408-4c7e-b78e-b713c19123d1Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ&move (generic function with 8 methods)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•èÕ°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$031e1106-7408-4c7e-b78e-b713c19123d1¹depends_on_disabled_cellsÂ§runtimeÎU=‹µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$7035c082-6e50-4df5-919f-5f09d2011b4aŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ,runepisode (generic function with 2 methods)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•â4æ°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$7035c082-6e50-4df5-919f-5f09d2011b4a¹depends_on_disabled_cellsÂ§runtimeÎN‹µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$bfe71b40-3157-47df-8494-67f8eb8e4e93Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ+runepisode (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•â4lz°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$bfe71b40-3157-47df-8494-67f8eb8e4e93¹depends_on_disabled_cellsÂ§runtimeÎ5Eµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$b35264b0-ac5b-40ce-95e4-9b2bc4cb106fŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÚ

TD(0) update rule for action values:

$$Q(S_t, A_t) \leftarrow Q(S_t, A_t) + \alpha [R_{t+1} + \gamma Q(S_{t+1}, A_{t+1})-Q(S_t, A_t)]$$

This update is done after every transition from a nonterminal state $S_t$. If $S_{t+1}$ is terminal, then $Q(S_{t+1}, A_{t+1})$ is defined as zero. This rule uses every element of the quintuple of events, $(S_t, A_t, R_{t+1}, S_{t+1}, A_{t+1})$, that make up a transition from one state-action pair to the next. This quintuple gives rise to the name Sarsa for the algorithm. Each update only uses the immediate reward and the value of the state-action pair in the subsequent state as illustrated in the backup diagram shown below.

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þº%°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$b35264b0-ac5b-40ce-95e4-9b2bc4cb106f¹depends_on_disabled_cellsÂ§runtimeÎNµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$d259ecca-0249-4b28-a4d7-6880d4d84495Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚ#

Actions

¤mime©text/html¬rootassigneeµconst action3_display²last_run_timestampËAÚ•è4~ª°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$d259ecca-0249-4b28-a4d7-6880d4d84495¹depends_on_disabled_cellsÂ§runtimeÎ¤ïµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$22c4ce8c-bd82-4eb3-8af5-55342018edffŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ>

Dynamic Programming Code

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•ÞÀq°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$22c4ce8c-bd82-4eb3-8af5-55342018edff¹depends_on_disabled_cellsÂ§runtimeÎõµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$6faa3015-3ac4-44af-a78c-10b175822441Š¦queuedÂ¤logs§runningÂ¦output†¤body…¦prefixÙÔMDP_TD{GridworldState, GridworldAction, var"#step#166"{var"#cliffcheck#165"{Int64, Float32, Float32, GridworldState}, var"#boundstate#164"{Int64, Int64}}, var"#sinit#160"{GridworldState}, var"#isterm#161"{Int64}}¨elements—’¦states’…¦prefixÙ$Main.var"workspace#3".GridworldState¨elements›’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°78e123e4d06443c5Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡2ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°e3e6b18864c38362Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡3ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°7d75a915b81b9730Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡4ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°32586272439d3588Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡2ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°ef30e57ae60bdc38Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡2ªtext/plain’¡y’¡2ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°74f49756a2864a57Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡2ªtext/plain’¡y’¡3ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°166e372c47e8ffa6Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡2ªtext/plain’¡y’¡4ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°fe67fb155f3229e7Ù!application/vnd.pluto.tree+object’ ’…¦prefix®GridworldState¨elements’’¡x’¡3ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°ec7c7c34244569a4Ù!application/vnd.pluto.tree+object¤more’0’…¦prefix®GridworldState¨elements’’¡x’¢12ªtext/plain’¡y’¡4ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°fe8dad79c4afe746Ù!application/vnd.pluto.tree+object¤type¥Array¬prefix_short ¨objectid®de3ad4e4e4511bÙ!application/vnd.pluto.tree+object’«statelookup’…¦prefixÙ1Dict{Main.var"workspace#3".GridworldState, Int64}¨elements›’’…¦prefix®GridworldState¨elements’’¡x’¢12ªtext/plain’¡y’¡3ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°c0d7ffbc18d93d08Ù!application/vnd.pluto.tree+object’¢47ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¢12ªtext/plain’¡y’¡2ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°8caa1e9c10ca4597Ù!application/vnd.pluto.tree+object’¢46ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡6ªtext/plain’¡y’¡4ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°4e5052ac2b36c8beÙ!application/vnd.pluto.tree+object’¢24ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡7ªtext/plain’¡y’¡4ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°6d65389daed97014Ù!application/vnd.pluto.tree+object’¢28ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡8ªtext/plain’¡y’¡4ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°b85af438304886c5Ù!application/vnd.pluto.tree+object’¢32ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¢10ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°dad6dff35c9621ffÙ!application/vnd.pluto.tree+object’¢37ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¢11ªtext/plain’¡y’¡3ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°9bb299bee3584629Ù!application/vnd.pluto.tree+object’¢43ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¢12ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°ba2d0d301e25fc6eÙ!application/vnd.pluto.tree+object’¢45ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡8ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°6d43cd1ca99a553eÙ!application/vnd.pluto.tree+object’¢29ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡2ªtext/plain’¡y’¡3ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°166e372c47e8ffa6Ù!application/vnd.pluto.tree+object’¡7ªtext/plain¤more¤type¤Dict¬prefix_short¤Dict¨objectid°ae76171086dcfe51Ù!application/vnd.pluto.tree+object’§actions’…¦prefixÙ%Main.var"workspace#3".GridworldAction¨elements”’’…¦prefix¢Up¨elements¤type¦struct¬prefix_short¢Up¨objectid°ffffffff92511601Ù!application/vnd.pluto.tree+object’’…¦prefix¤Down¨elements¤type¦struct¬prefix_short¤Down¨objectid°ffffffff0a19a748Ù!application/vnd.pluto.tree+object’’…¦prefix¤Left¨elements¤type¦struct¬prefix_short¤Left¨objectid°ffffffff13951cc2Ù!application/vnd.pluto.tree+object’’…¦prefix¥Right¨elements¤type¦struct¬prefix_short¥Right¨objectid°ffffffffa8d2f4c6Ù!application/vnd.pluto.tree+object¤type¥Array¬prefix_short ¨objectid°952f6adeb23ade52Ù!application/vnd.pluto.tree+object’¬actionlookup’…¦prefixÙ2Dict{Main.var"workspace#3".GridworldAction, Int64}¨elements”’’…¦prefix¤Down¨elements¤type¦struct¬prefix_short¤Down¨objectid°ffffffff0a19a748Ù!application/vnd.pluto.tree+object’¡2ªtext/plain’’…¦prefix¤Left¨elements¤type¦struct¬prefix_short¤Left¨objectid°ffffffff13951cc2Ù!application/vnd.pluto.tree+object’¡3ªtext/plain’’…¦prefix¥Right¨elements¤type¦struct¬prefix_short¥Right¨objectid°ffffffffa8d2f4c6Ù!application/vnd.pluto.tree+object’¡4ªtext/plain’’…¦prefix¢Up¨elements¤type¦struct¬prefix_short¢Up¨objectid°ffffffff92511601Ù!application/vnd.pluto.tree+object’¡1ªtext/plain¤type¤Dict¬prefix_short¤Dict¨objectid°e145d5576c7b4e1eÙ!application/vnd.pluto.tree+object’ªstate_init’Ùp(::Main.var"workspace#3".var"#sinit#160"{Main.var"workspace#3".GridworldState}) (generic function with 1 method)ªtext/plain’¤step’Ùî(::Main.var"workspace#3".var"#step#166"{Main.var"workspace#3".var"#cliffcheck#165"{Int64, Float32, Float32, Main.var"workspace#3".GridworldState}, Main.var"workspace#3".var"#boundstate#164"{Int64, Int64}}) (generic function with 1 method)ªtext/plain’¦isterm’ÙR(::Main.var"workspace#3".var"#isterm#161"{Int64}) (generic function with 1 method)ªtext/plain¤type¦struct¬prefix_short¦MDP_TD¨objectid°811f3ac70b1110bb¤mimeÙ!application/vnd.pluto.tree+object¬rootassignee°const cliffworld²last_run_timestampËAÚ•éØï°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$6faa3015-3ac4-44af-a78c-10b175822441¹depends_on_disabled_cellsÂ§runtimeÎáÑ$µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$fa04d20f-6e3f-46f8-b3f7-a543d1fa360aŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ7max_bias_visualization (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚš{H°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$fa04d20f-6e3f-46f8-b3f7-a543d1fa360a¹depends_on_disabled_cellsÂ§runtimeÎX8µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$297f1606-4ec2-4075-9f81-926dc517b76fŠ¦queuedÂ¤logs§runningÂ¦output†¤body…¦prefixÙ3FiniteMDP{Float32, GridworldState, GridworldAction}¨elements™’¦states’…¦prefixÙ$Main.var"workspace#3".GridworldState¨elements™’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°78e123e4d06443c5Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡2ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°e3e6b18864c38362Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡3ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°7d75a915b81b9730Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡2ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°ef30e57ae60bdc38Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡2ªtext/plain’¡y’¡2ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°74f49756a2864a57Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡2ªtext/plain’¡y’¡3ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°166e372c47e8ffa6Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡3ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°ec7c7c34244569a4Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡3ªtext/plain’¡y’¡2ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°c1258421535f88fcÙ!application/vnd.pluto.tree+object’ ’…¦prefix®GridworldState¨elements’’¡x’¡3ªtext/plain’¡y’¡3ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°3ed622ab169cc67cÙ!application/vnd.pluto.tree+object¤type¥Array¬prefix_short ¨objectid°f181cfeac924fd67Ù!application/vnd.pluto.tree+object’§actions’…¦prefixÙ%Main.var"workspace#3".GridworldAction¨elements”’’…¦prefix¢Up¨elements¤type¦struct¬prefix_short¢Up¨objectid°ffffffff92511601Ù!application/vnd.pluto.tree+object’’…¦prefix¤Down¨elements¤type¦struct¬prefix_short¤Down¨objectid°ffffffff0a19a748Ù!application/vnd.pluto.tree+object’’…¦prefix¤Left¨elements¤type¦struct¬prefix_short¤Left¨objectid°ffffffff13951cc2Ù!application/vnd.pluto.tree+object’’…¦prefix¥Right¨elements¤type¦struct¬prefix_short¥Right¨objectid°ffffffffa8d2f4c6Ù!application/vnd.pluto.tree+object¤type¥Array¬prefix_short ¨objectid°952f6adeb23ade52Ù!application/vnd.pluto.tree+object’§rewards’…¦prefix§Float32¨elements“’’£0.0ªtext/plain’’¤-1.2ªtext/plain’’£1.0ªtext/plain¤type¥Array¬prefix_short ¨objectid°40ec8c181ac955f3Ù!application/vnd.pluto.tree+object’£ptf’ÚX9Ã—3Ã—4Ã—9 Array{Float32, 4}: [:, :, 1, 1] = 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 2, 1] = 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 3, 1] = 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 4, 1] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 1, 2] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 2, 2] = 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 3, 2] = 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 4, 2] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 1, 3] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 2, 3] = 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 3, 3] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 4, 3] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 1, 4] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 2, 4] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 3, 4] = 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 4, 4] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 1, 5] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 2, 5] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 3, 5] = 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 4, 5] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 [:, :, 1, 6] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 2, 6] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 3, 6] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 4, 6] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 [:, :, 1, 7] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 [:, :, 2, 7] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 3, 7] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 4, 7] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 1, 8] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 [:, :, 2, 8] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 3, 8] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 4, 8] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 [:, :, 1, 9] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 [:, :, 2, 9] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 [:, :, 3, 9] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 [:, :, 4, 9] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0ªtext/plain’®action_scratch’…¦prefix§Float32¨elements”’’§366.085ªtext/plain’’§366.829ªtext/plain’’§367.146ªtext/plain’’§366.144ªtext/plain¤type¥Array¬prefix_short ¨objectid°4505058d22fa49d1Ù!application/vnd.pluto.tree+object’state_scratch’…¦prefix§Float32¨elementsš’’¤-1.2ªtext/plain’’£0.0ªtext/plain’’§3.0f-45ªtext/plain’’£0.0ªtext/plain’’§3.0f-45ªtext/plain’’£0.0ªtext/plain’’§3.0f-45ªtext/plain’’£0.0ªtext/plain’ ’¦-100.0ªtext/plain’ ’¦-100.0ªtext/plain¤type¥Array¬prefix_short ¨objectid°f6286d0d8541196fÙ!application/vnd.pluto.tree+object’®reward_scratch’…¦prefix§Float32¨elements“’’¥0.025ªtext/plain’’¥0.025ªtext/plain’’¥0.925ªtext/plain¤type¥Array¬prefix_short ¨objectid°7d00b65012f1a212Ù!application/vnd.pluto.tree+object’«state_index’…¦prefixÙ1Dict{Main.var"workspace#3".GridworldState, Int64}¨elements™’’…¦prefix®GridworldState¨elements’’¡x’¡2ªtext/plain’¡y’¡2ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°74f49756a2864a57Ù!application/vnd.pluto.tree+object’¡5ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡3ªtext/plain’¡y’¡2ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°c1258421535f88fcÙ!application/vnd.pluto.tree+object’¡8ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡2ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°ef30e57ae60bdc38Ù!application/vnd.pluto.tree+object’¡4ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡3ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°7d75a915b81b9730Ù!application/vnd.pluto.tree+object’¡3ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡3ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°ec7c7c34244569a4Ù!application/vnd.pluto.tree+object’¡7ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°78e123e4d06443c5Ù!application/vnd.pluto.tree+object’¡1ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡3ªtext/plain’¡y’¡3ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°3ed622ab169cc67cÙ!application/vnd.pluto.tree+object’¡9ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡2ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°e3e6b18864c38362Ù!application/vnd.pluto.tree+object’¡2ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡2ªtext/plain’¡y’¡3ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°166e372c47e8ffa6Ù!application/vnd.pluto.tree+object’¡6ªtext/plain¤type¤Dict¬prefix_short¤Dict¨objectid°c3f2e783e7b8d04eÙ!application/vnd.pluto.tree+object’¬action_index’…¦prefixÙ2Dict{Main.var"workspace#3".GridworldAction, Int64}¨elements”’’…¦prefix¤Down¨elements¤type¦struct¬prefix_short¤Down¨objectid°ffffffff0a19a748Ù!application/vnd.pluto.tree+object’¡2ªtext/plain’’…¦prefix¤Left¨elements¤type¦struct¬prefix_short¤Left¨objectid°ffffffff13951cc2Ù!application/vnd.pluto.tree+object’¡3ªtext/plain’’…¦prefix¥Right¨elements¤type¦struct¬prefix_short¥Right¨objectid°ffffffffa8d2f4c6Ù!application/vnd.pluto.tree+object’¡4ªtext/plain’’…¦prefix¢Up¨elements¤type¦struct¬prefix_short¢Up¨objectid°ffffffff92511601Ù!application/vnd.pluto.tree+object’¡1ªtext/plain¤type¤Dict¬prefix_short¤Dict¨objectid¯322be8c66151a36Ù!application/vnd.pluto.tree+object¤type¦struct¬prefix_short©FiniteMDP¨objectid°ae5b624a3b584e00¤mimeÙ!application/vnd.pluto.tree+object¬rootassignee¸const noisy_gridworld_dp²last_run_timestampËAÚšvÝ”°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$297f1606-4ec2-4075-9f81-926dc517b76f¹depends_on_disabled_cellsÂ§runtimeÎl;µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$f2776908-d06a-4073-b2ce-ecbf109c9cc7Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ2

King Actions

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ»m°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$f2776908-d06a-4073-b2ce-ecbf109c9cc7¹depends_on_disabled_cellsÂ§runtimeÎÌkµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$d83ff60f-8973-4dc1-9358-5ad109ea5490Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚ

Solutions on Noisy Gridworld

Load Existing Results if Present:

If file does not load correctly, uncheck this box to produce new results.

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚšâ“°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$d83ff60f-8973-4dc1-9358-5ad109ea5490¹depends_on_disabled_cellsÂ§runtimeÎîµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$105c5c23-270d-437e-89dd-12297814c6e0Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚÄ

Exercise 6.6

In Example 6.2 we stated that the true values for the random walk example are 1/6 , 2/6 , 3/6 , 4/6 , and 5/6 , for states A through E. Describe at least two different ways that these could have been computed. Which would you guess we actually used? Why?

Method 1: Set up the following system of equations that represent the relationship between state values

$$\begin{flalign} V(A) &= \frac{0+V(B)}{2} \implies 2V(A)=V(B) \\ V(B) &= \frac{V(A)+V(C)}{2} \implies 2V(B) = V(A)+V(C)\\ V(C) &= \frac{V(B)+V(D)}{2} \implies 2V(C)=V(B)+V(D)\\ V(D) &= \frac{V(C)+V(E)}{2} \implies 2V(D)=V(C)+V(E)\\ V(E) &= \frac{V(D)+1}{2} \implies 2V(E)=V(D)+1\\ \end{flalign}$$

We can work down from the top equation expressing everything in terms of A. For shorter expressions $V(A)$ will be written below as $A$ and likewise for other states:

$$\begin{flalign} B&=2A \\ 2B&=A+C \implies C = 3A \\ 2C&=B+D \implies D = 6A-2A=4A \\ 2D&=C+E \implies E = 8A-3A = 5A \\ 2E &= D + 1 \implies 10A = 4A + 1 \implies A = \frac{1}{6} \end{flalign}$$

Now that we have the value for A, all the others are trivial multiplications of it from 2 to 5.

Method 2: Calculate each value from probability of each trajectory

With this method to get $V(A)$ we would write down every possible trajectory to a terminal state with the associated probability of each. Since trajectories terminating to the left have a value of 0, we only need to add up the trajectories that terminate to the right. Below are some examples for state A.

$$V(A) = 0.5^5 + 4 \times 0.5^7 + \cdots$$

This equation represents the single trajectory that takes 5 steps to the right each with probability one half and the 4 possible trajectories that turn around once on the way right resulting in 7 steps. This sum will end up being infintely long to account for all of the trajectories that bounce back and forth arbitrarily large amounts of time. This method is significantly harder to calculate for each state compared to the first method and is more in line with how estimates are calculated with MC sampling. The first method is more analogous to TD sampling using the bootstrapped form of the Bellman equation.

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ¹O°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$105c5c23-270d-437e-89dd-12297814c6e0¹depends_on_disabled_cellsÂ§runtimeÎ8ˆµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$e8f94345-9ad5-48d4-8709-d796fb55db3fŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÚj„ ¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•æ¾¡Q°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$e8f94345-9ad5-48d4-8709-d796fb55db3f¹depends_on_disabled_cellsÂ§runtimeÎQÛµpublished_object_keys‘Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/1cb9d5b796f6ec98¸depends_on_skipped_cellsÂ§erroredÂÙ$64b210e8-223f-41f7-a6b7-8af6183ddf87Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ5make_noisy_gridworld (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚš&£r°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$64b210e8-223f-41f7-a6b7-8af6183ddf87¹depends_on_disabled_cellsÂ§runtimeÎ+Ùµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$2f4e2da2-b1a1-41b1-8904-39b59f426da4Š¦queuedÂ¤logs§runningÂ¦output†¤body…¦prefixÙ3FiniteMDP{Float32, GridworldState, GridworldAction}¨elements™’¦states’…¦prefixÙ$Main.var"workspace#3".GridworldState¨elements›’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°78e123e4d06443c5Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡2ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°e3e6b18864c38362Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡3ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°7d75a915b81b9730Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡4ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°32586272439d3588Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡5ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°593769200b7ddf14Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡6ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°d7705072ebc67732Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡7ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid¯32fa797472e0a83Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡2ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°ef30e57ae60bdc38Ù!application/vnd.pluto.tree+object’ ’…¦prefix®GridworldState¨elements’’¡x’¡2ªtext/plain’¡y’¡2ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°74f49756a2864a57Ù!application/vnd.pluto.tree+object¤more’F’…¦prefix®GridworldState¨elements’’¡x’¢10ªtext/plain’¡y’¡7ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°91d5970141de4b2dÙ!application/vnd.pluto.tree+object¤type¥Array¬prefix_short ¨objectid°9c2325e0c8202abeÙ!application/vnd.pluto.tree+object’§actions’…¦prefixÙ%Main.var"workspace#3".GridworldAction¨elements˜’’…¦prefix¢Up¨elements¤type¦struct¬prefix_short¢Up¨objectid°ffffffff92511601Ù!application/vnd.pluto.tree+object’’…¦prefix¤Down¨elements¤type¦struct¬prefix_short¤Down¨objectid°ffffffff0a19a748Ù!application/vnd.pluto.tree+object’’…¦prefix¤Left¨elements¤type¦struct¬prefix_short¤Left¨objectid°ffffffff13951cc2Ù!application/vnd.pluto.tree+object’’…¦prefix¥Right¨elements¤type¦struct¬prefix_short¥Right¨objectid°ffffffffa8d2f4c6Ù!application/vnd.pluto.tree+object’’…¦prefix§UpRight¨elements¤type¦struct¬prefix_short§UpRight¨objectid°ffffffff65dca132Ù!application/vnd.pluto.tree+object’’…¦prefix¦UpLeft¨elements¤type¦struct¬prefix_short¦UpLeft¨objectid°ffffffff68f3503eÙ!application/vnd.pluto.tree+object’’…¦prefix©DownRight¨elements¤type¦struct¬prefix_short©DownRight¨objectid°ffffffff97f641f9Ù!application/vnd.pluto.tree+object’’…¦prefix¨DownLeft¨elements¤type¦struct¬prefix_short¨DownLeft¨objectid°ffffffffd243dd41Ù!application/vnd.pluto.tree+object¤type¥Array¬prefix_short ¨objectid°d84fdc99910d1e41Ù!application/vnd.pluto.tree+object’§rewards’…¦prefix§Float32¨elements’’’£0.0ªtext/plain’’¤-1.0ªtext/plain¤type¥Array¬prefix_short ¨objectid°5526f803322f66c4Ù!application/vnd.pluto.tree+object’£ptf’ÚQ70Ã—2Ã—8Ã—70 Array{Float32, 4}: [:, :, 1, 1] = 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 2, 1] = 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 3, 1] = 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 4, 1] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 5, 1] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 6, 1] = 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 7, 1] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 8, 1] = 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 1, 2] = 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 2, 2] = 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 3, 2] = 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 4, 2] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 5, 2] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 6, 2] = 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 7, 2] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 8, 2] = 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 1, 3] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 2, 3] = 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 3, 3] = 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 4, 3] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 5, 3] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 6, 3] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 7, 3] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 8, 3] = 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ;;;; â€¦ [:, :, 1, 68] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 [:, :, 2, 68] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 3, 68] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 4, 68] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 [:, :, 5, 68] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 [:, :, 6, 68] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 7, 68] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 8, 68] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 1, 69] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 [:, :, 2, 69] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 [:, :, 3, 69] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 4, 69] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 [:, :, 5, 69] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 [:, :, 6, 69] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 7, 69] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 [:, :, 8, 69] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 1, 70] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 [:, :, 2, 70] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 [:, :, 3, 70] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 4, 70] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 [:, :, 5, 70] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 [:, :, 6, 70] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 7, 70] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 [:, :, 8, 70] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0ªtext/plain’®action_scratch’…¦prefix§Float32¨elements˜’’¤-1.2ªtext/plain’’¤0.95ªtext/plain’’§1.0f-45ªtext/plain’’£0.0ªtext/plain’’§3.0f-45ªtext/plain’’£0.0ªtext/plain’’§3.0f-45ªtext/plain’’£0.0ªtext/plain¤type¥Array¬prefix_short ¨objectid°2ebdf0c7655d58e5Ù!application/vnd.pluto.tree+object’state_scratch’…¦prefix§Float32¨elements›’’§270.546ªtext/plain’’§271.621ªtext/plain’’§272.847ªtext/plain’’¦271.53ªtext/plain’’£0.1ªtext/plain’’£0.1ªtext/plain’’«6.90348f-18ªtext/plain’’ª4.5677f-41ªtext/plain’ ’§6.0f-45ªtext/plain¤more’G’«-2.03361f35ªtext/plain¤type¥Array¬prefix_short ¨objectid°4860ce2498a311a3Ù!application/vnd.pluto.tree+object’®reward_scratch’…¦prefix§Float32¨elements’’’§3.0f-45ªtext/plain’’£0.0ªtext/plain¤type¥Array¬prefix_short ¨objectid°a02f8b1848408f61Ù!application/vnd.pluto.tree+object’«state_index’…¦prefixÙ1Dict{Main.var"workspace#3".GridworldState, Int64}¨elements›’’…¦prefix®GridworldState¨elements’’¡x’¡8ªtext/plain’¡y’¡5ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°14e5eae9a48c6749Ù!application/vnd.pluto.tree+object’¢54ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡6ªtext/plain’¡y’¡4ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°4e5052ac2b36c8beÙ!application/vnd.pluto.tree+object’¢39ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡7ªtext/plain’¡y’¡4ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°6d65389daed97014Ù!application/vnd.pluto.tree+object’¢46ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡8ªtext/plain’¡y’¡4ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°b85af438304886c5Ù!application/vnd.pluto.tree+object’¢53ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¢10ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°dad6dff35c9621ffÙ!application/vnd.pluto.tree+object’¢64ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡6ªtext/plain’¡y’¡7ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°4e4b90239eb3be65Ù!application/vnd.pluto.tree+object’¢42ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡8ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°6d43cd1ca99a553eÙ!application/vnd.pluto.tree+object’¢50ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡2ªtext/plain’¡y’¡3ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°166e372c47e8ffa6Ù!application/vnd.pluto.tree+object’¢10ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡5ªtext/plain’¡y’¡3ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°f8402269233868c7Ù!application/vnd.pluto.tree+object’¢31ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡8ªtext/plain’¡y’¡7ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°b08053c76dcd8072Ù!application/vnd.pluto.tree+object’¢56ªtext/plain¤more¤type¤Dict¬prefix_short¤Dict¨objectid°e70499b329487769Ù!application/vnd.pluto.tree+object’¬action_index’…¦prefixÙ2Dict{Main.var"workspace#3".GridworldAction, Int64}¨elements˜’’…¦prefix¤Down¨elements¤type¦struct¬prefix_short¤Down¨objectid°ffffffff0a19a748Ù!application/vnd.pluto.tree+object’¡2ªtext/plain’’…¦prefix¤Left¨elements¤type¦struct¬prefix_short¤Left¨objectid°ffffffff13951cc2Ù!application/vnd.pluto.tree+object’¡3ªtext/plain’’…¦prefix§UpRight¨elements¤type¦struct¬prefix_short§UpRight¨objectid°ffffffff65dca132Ù!application/vnd.pluto.tree+object’¡5ªtext/plain’’…¦prefix©DownRight¨elements¤type¦struct¬prefix_short©DownRight¨objectid°ffffffff97f641f9Ù!application/vnd.pluto.tree+object’¡7ªtext/plain’’…¦prefix¨DownLeft¨elements¤type¦struct¬prefix_short¨DownLeft¨objectid°ffffffffd243dd41Ù!application/vnd.pluto.tree+object’¡8ªtext/plain’’…¦prefix¥Right¨elements¤type¦struct¬prefix_short¥Right¨objectid°ffffffffa8d2f4c6Ù!application/vnd.pluto.tree+object’¡4ªtext/plain’’…¦prefix¦UpLeft¨elements¤type¦struct¬prefix_short¦UpLeft¨objectid°ffffffff68f3503eÙ!application/vnd.pluto.tree+object’¡6ªtext/plain’’…¦prefix¢Up¨elements¤type¦struct¬prefix_short¢Up¨objectid°ffffffff92511601Ù!application/vnd.pluto.tree+object’¡1ªtext/plain¤type¤Dict¬prefix_short¤Dict¨objectid°69b123a92ea18d23Ù!application/vnd.pluto.tree+object¤type¦struct¬prefix_short©FiniteMDP¨objectid°311b7b18ca3c1a72¤mimeÙ!application/vnd.pluto.tree+object¬rootassignee»const king_gridworld_mdp_dp²last_run_timestampËAÚš‚G°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$2f4e2da2-b1a1-41b1-8904-39b59f426da4¹depends_on_disabled_cellsÂ§runtimeÎÕ…µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$bc8bad61-a49a-47d6-8fa6-7dcf6c221910Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ,example_6_1 (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•âJÁô°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$bc8bad61-a49a-47d6-8fa6-7dcf6c221910¹depends_on_disabled_cellsÂ§runtimeÎi§çµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$2455742f-dc18-4d6b-9f58-5666adac6919Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ6create_car_rental_mdp (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚšª*°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$2455742f-dc18-4d6b-9f58-5666adac6919¹depends_on_disabled_cellsÂ§runtimeÎåRÖµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$f474fcbd-e3c3-49fd-a6b7-6d6a8a7dda09Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ=

Informal Proof for Bias

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ¼°h°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$f474fcbd-e3c3-49fd-a6b7-6d6a8a7dda09¹depends_on_disabled_cellsÂ§runtimeÎÒ@µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$69eedbfd-396f-4461-b7a1-c36abc094581Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ0example_6_7_mdp (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚšàNU°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$69eedbfd-396f-4461-b7a1-c36abc094581¹depends_on_disabled_cellsÂ§runtimeÎNÝµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$7ac99619-5232-4db8-8553-d79ea5415d29Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ6create_gridworld_mdp (generic function with 2 methods)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚš9K°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$7ac99619-5232-4db8-8553-d79ea5415d29¹depends_on_disabled_cellsÂ§runtimeÎ$S µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$0163763b-a15f-447e-b3d2-32d4bf9d2605Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚÜ

Number of Variables:

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚšgî °persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$0163763b-a15f-447e-b3d2-32d4bf9d2605¹depends_on_disabled_cellsÂ§runtimeÎ(ãƒµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$53145cc2-784c-468b-8e91-9bb7866db218Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚ'Ã

speed:

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•å±`ž°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$53145cc2-784c-468b-8e91-9bb7866db218¹depends_on_disabled_cellsÂ§runtimeÎH!‘µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$6b496582-cc0e-4195-87ef-94792b0fff54Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ7make_Ïµ_greedy_policy! (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•çÖZl°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$6b496582-cc0e-4195-87ef-94792b0fff54¹depends_on_disabled_cellsÂ§runtimeÎ:eµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$9db7a268-1e6d-4366-a0ec-ebf54916d3b0Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚÖâ

The right graph shows learning curves for the two methods for various values of Î±. The performance measure shown is the root mean square (RMS) error between the vlue function learned and the true value function, averaged over the 5 states, then averaged over 100 runs. In all cases the approximate value function was initialized to the intermediate value 0.5. The TD method was consistently better than the MC method on this task.¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•æ…>¸°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$9db7a268-1e6d-4366-a0ec-ebf54916d3b0¹depends_on_disabled_cellsÂ§runtimeÎ¶Úåµpublished_object_keys’Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/24aa7574d5705350Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/97d5d32b3ca95403¸depends_on_skipped_cellsÂ§erroredÂÙ$c2f56287-9a3e-454a-9ec1-53184b788db9Š¦queuedÂ¤logs§runningÂ¦output†¤body…¦prefixÙ.FiniteMDP{Float32, Tuple{Int64, Int64}, Int64}¨elements™’¦states’…¦prefix³Tuple{Int64, Int64}¨elements›’’ƒ¨elements’’’¡0ªtext/plain’’¡0ªtext/plain¤type¥Tuple¨objectid°9b52efd7a2a08bd5Ù!application/vnd.pluto.tree+object’’ƒ¨elements’’’¡0ªtext/plain’’¡1ªtext/plain¤type¥Tuple¨objectid°86128cc9b5ae8f4aÙ!application/vnd.pluto.tree+object’’ƒ¨elements’’’¡0ªtext/plain’’¡2ªtext/plain¤type¥Tuple¨objectid°fc41ae7a664555b0Ù!application/vnd.pluto.tree+object’’ƒ¨elements’’’¡0ªtext/plain’’¡3ªtext/plain¤type¥Tuple¨objectid°5a8d0f981b76571aÙ!application/vnd.pluto.tree+object’’ƒ¨elements’’’¡0ªtext/plain’’¡4ªtext/plain¤type¥Tuple¨objectid°6ac4b5902680c6bbÙ!application/vnd.pluto.tree+object’’ƒ¨elements’’’¡0ªtext/plain’’¡5ªtext/plain¤type¥Tuple¨objectid°22d2c06707ebb5c4Ù!application/vnd.pluto.tree+object’’ƒ¨elements’’’¡0ªtext/plain’’¡6ªtext/plain¤type¥Tuple¨objectid°cd86b46be06a2ab4Ù!application/vnd.pluto.tree+object’’ƒ¨elements’’’¡0ªtext/plain’’¡7ªtext/plain¤type¥Tuple¨objectid°6f83360483e5fb68Ù!application/vnd.pluto.tree+object’ ’ƒ¨elements’’’¡0ªtext/plain’’¡8ªtext/plain¤type¥Tuple¨objectid°f2740b9bf789ce84Ù!application/vnd.pluto.tree+object¤more’Í¹’ƒ¨elements’’’¢20ªtext/plain’’¢20ªtext/plain¤type¥Tuple¨objectid°6e264f7db8959fbfÙ!application/vnd.pluto.tree+object¤type¥Array¬prefix_short ¨objectid°4991e339a25b2e8dÙ!application/vnd.pluto.tree+object’§actions’…¦prefix¥Int64¨elements›’’¢-5ªtext/plain’’¢-4ªtext/plain’’¢-3ªtext/plain’’¢-2ªtext/plain’’¢-1ªtext/plain’’¡0ªtext/plain’’¡1ªtext/plain’’¡2ªtext/plain’ ’¡3ªtext/plain’ ’¡4ªtext/plain’’¡5ªtext/plain¤type¥Array¬prefix_short ¨objectid°de6e880a4c13f858Ù!application/vnd.pluto.tree+object’§rewards’…¦prefix§Float32¨elements›’’¥-10.0ªtext/plain’’¤-8.0ªtext/plain’’¤-6.0ªtext/plain’’¤-4.0ªtext/plain’’¤-2.0ªtext/plain’’£0.0ªtext/plain’’£2.0ªtext/plain’’£4.0ªtext/plain’ ’£6.0ªtext/plain¤more’Ì½’¥380.0ªtext/plain¤type¥Array¬prefix_short ¨objectid°d8ad7f081083b5ebÙ!application/vnd.pluto.tree+object’£ptf’Ú¦T441Ã—189Ã—11Ã—441 Array{Float32, 4}: [:, :, 1, 1] = 0.00673795 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0134759 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0134759 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00898393 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00449196 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00179679 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000598929 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® â‹® â‹± â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 2, 1] = 0.0 0.00673795 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0134759 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0134759 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00898393 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00449196 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00179679 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000598929 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® â‹® â‹± â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 3, 1] = 0.0 0.0 0.00673795 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0134759 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0134759 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00898393 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00449196 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00179679 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000598929 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® â‹® â‹± â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ;;; â€¦ [:, :, 9, 1] = 0.0 0.0 0.00673795 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0134759 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0134759 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00898393 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00449196 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00179679 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000598929 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® â‹® â‹± â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 10, 1] = 0.0 0.00673795 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0134759 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0134759 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00898393 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00449196 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00179679 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000598929 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® â‹® â‹± â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 11, 1] = 0.00673795 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0134759 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0134759 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00898393 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00449196 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00179679 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000598929 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® â‹® â‹± â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 1, 2] = 0.0 0.0 0.0 0.0 0.0 0.00640248 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.012805 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.012805 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00853665 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00426832 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00170733 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00056911 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® â‹® â‹± â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 2, 2] = 0.0 0.0 0.0 0.0 0.0 0.0 0.00640248 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.012805 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.012805 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00853665 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00426832 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00170733 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00056911 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® â‹® â‹± â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 3, 2] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® â‹® â‹± â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ;;; â€¦ [:, :, 9, 2] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00012341 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00024682 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00024682 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000164546 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 8.22732f-5 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3.29093f-5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® â‹® â‹± â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 10, 2] = 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00012341 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00024682 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00024682 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000164546 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 8.22732f-5 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3.29093f-5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® â‹® â‹± â‹® 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 11, 2] = 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00012341 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00024682 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00024682 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000164546 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 8.22732f-5 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3.29093f-5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® â‹± â‹® 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 1, 3] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® â‹® â‹± â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 2, 3] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® â‹® â‹± â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 3, 3] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® â‹® â‹± â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ;;; â€¦ [:, :, 9, 3] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00012341 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00024682 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00024682 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000164546 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 8.22732f-5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® â‹® â‹± â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 10, 3] = 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00012341 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00024682 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00024682 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000164546 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 8.22732f-5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® â‹® â‹± â‹® 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 11, 3] = 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00012341 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00024682 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00024682 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000164546 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 8.22732f-5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® â‹± â‹® 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ;;;; â€¦ [:, :, 1, 439] = 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® â‹± â‹® 0.00024682 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000164546 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 8.22732f-5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3.29093f-5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.09698f-5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 4.13432f-6 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 2, 439] = 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® â‹® â‹± â‹® 0.0 0.00024682 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00024682 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000164546 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 8.22732f-5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3.29093f-5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.51041f-5 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 3, 439] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® â‹® â‹± â‹® 0.0 0.0 0.00012341 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00024682 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00024682 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000164546 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 8.22732f-5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 4.80134f-5 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ;;; â€¦ [:, :, 9, 439] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® â‹® â‹± â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000525983 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 10, 439] = 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® â‹® â‹± â‹® 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000321683 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 11, 439] = 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® â‹± â‹® 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000168458 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 1, 440] = 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® â‹± â‹® 0.00024682 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00024682 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000164546 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 8.22732f-5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3.29093f-5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.51041f-5 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 2, 440] = 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® â‹® â‹± â‹® 0.0 0.00012341 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00024682 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00024682 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000164546 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 8.22732f-5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 4.80134f-5 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 3, 440] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® â‹® â‹± â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00012341 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00024682 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00024682 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000164546 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000130287 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ;;; â€¦ [:, :, 9, 440] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® â‹® â‹± â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000525983 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 10, 440] = 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® â‹® â‹± â‹® 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000321683 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 11, 440] = 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® â‹± â‹® 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000168458 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 1, 441] = 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® â‹± â‹® 0.00012341 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00024682 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00024682 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000164546 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 8.22732f-5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 4.80134f-5 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 2, 441] = 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® â‹® â‹± â‹® 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00012341 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00024682 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00024682 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000164546 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000130287 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 3, 441] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® â‹® â‹± â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00012341 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00024682 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00024682 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000294833 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ;;; â€¦ [:, :, 9, 441] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® â‹® â‹± â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000525983 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 10, 441] = 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® â‹® â‹± â‹® 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000321683 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 11, 441] = 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® â‹± â‹® 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000168458 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0ªtext/plain’®action_scratch’…¦prefix§Float32¨elements›’’«-1.55978f29ªtext/plain’’«-2.64806f36ªtext/plain’’«-1.69975f38ªtext/plain’’£NaNªtext/plain’’£NaNªtext/plain’’£NaNªtext/plain’’£NaNªtext/plain’’£NaNªtext/plain’ ’£NaNªtext/plain’ ’£NaNªtext/plain’’£NaNªtext/plain¤type¥Array¬prefix_short ¨objectid®e9a7fe1919d4ebÙ!application/vnd.pluto.tree+object’state_scratch’…¦prefix§Float32¨elements›’’£0.0ªtext/plain’’£0.0ªtext/plain’’£0.0ªtext/plain’’£0.0ªtext/plain’’£0.0ªtext/plain’’£0.0ªtext/plain’’£0.0ªtext/plain’’£0.0ªtext/plain’ ’£0.0ªtext/plain¤more’Íº’£0.0ªtext/plain¤type¥Array¬prefix_short ¨objectid°90562f95b7f2e378Ù!application/vnd.pluto.tree+object’®reward_scratch’…¦prefix§Float32¨elements›’’¨0.037517ªtext/plain’’ª4.5677f-41ªtext/plain’’©0.0375508ªtext/plain’’ª4.5677f-41ªtext/plain’’«5.53055f-32ªtext/plain’’ª4.5677f-41ªtext/plain’’«1.06974f-31ªtext/plain’’ª4.5677f-41ªtext/plain’ ’©0.0106894ªtext/plain¤more’Ì½’£0.0ªtext/plain¤type¥Array¬prefix_short ¨objectid°257631496232e689Ù!application/vnd.pluto.tree+object’«state_index’…¦prefixÙ Dict{Tuple{Int64, Int64}, Int64}¨elements›’’ƒ¨elements’’’¢11ªtext/plain’’¢17ªtext/plain¤type¥Tuple¨objectid°49ec9371b177a25dÙ!application/vnd.pluto.tree+object’£249ªtext/plain’’ƒ¨elements’’’¢16ªtext/plain’’¢14ªtext/plain¤type¥Tuple¨objectid°d93d095a02371a59Ù!application/vnd.pluto.tree+object’£351ªtext/plain’’ƒ¨elements’’’¢18ªtext/plain’’¢16ªtext/plain¤type¥Tuple¨objectid°aeb6f295858259dbÙ!application/vnd.pluto.tree+object’£395ªtext/plain’’ƒ¨elements’’’¢17ªtext/plain’’¢12ªtext/plain¤type¥Tuple¨objectid¯68544eea78f6641Ù!application/vnd.pluto.tree+object’£370ªtext/plain’’ƒ¨elements’’’¡8ªtext/plain’’¢15ªtext/plain¤type¥Tuple¨objectid°ceff527f41a09840Ù!application/vnd.pluto.tree+object’£184ªtext/plain’’ƒ¨elements’’’¢16ªtext/plain’’¢16ªtext/plain¤type¥Tuple¨objectid°3164689f12bc7404Ù!application/vnd.pluto.tree+object’£353ªtext/plain’’ƒ¨elements’’’¢19ªtext/plain’’¢14ªtext/plain¤type¥Tuple¨objectid°cb90bf273945b2c8Ù!application/vnd.pluto.tree+object’£414ªtext/plain’’ƒ¨elements’’’¡7ªtext/plain’’¢18ªtext/plain¤type¥Tuple¨objectid°f3c6affef4f32144Ù!application/vnd.pluto.tree+object’£166ªtext/plain’’ƒ¨elements’’’¡7ªtext/plain’’¡8ªtext/plain¤type¥Tuple¨objectid°300559d2f34a9666Ù!application/vnd.pluto.tree+object’£156ªtext/plain’’ƒ¨elements’’’¢14ªtext/plain’’¢15ªtext/plain¤type¥Tuple¨objectid°ac753ed572b44c1dÙ!application/vnd.pluto.tree+object’£310ªtext/plain¤more¤type¤Dict¬prefix_short¤Dict¨objectid°c3ad687634c83340Ù!application/vnd.pluto.tree+object’¬action_index’…¦prefix²Dict{Int64, Int64}¨elements›’’¡5ªtext/plain’¢11ªtext/plain’’¢-3ªtext/plain’¡3ªtext/plain’’¡1ªtext/plain’¡7ªtext/plain’’¡0ªtext/plain’¡6ªtext/plain’’¡4ªtext/plain’¢10ªtext/plain’’¢-5ªtext/plain’¡1ªtext/plain’’¢-1ªtext/plain’¡5ªtext/plain’’¡2ªtext/plain’¡8ªtext/plain’’¢-2ªtext/plain’¡4ªtext/plain’’¢-4ªtext/plain’¡2ªtext/plain¤more¤type¤Dict¬prefix_short¤Dict¨objectid°f407545a157c0a2bÙ!application/vnd.pluto.tree+object¤type¦struct¬prefix_short©FiniteMDP¨objectid¯15204eb150d1284¤mimeÙ!application/vnd.pluto.tree+object¬rootassignee³const jacks_car_mdp²last_run_timestampËAÚš€ÉÉ°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$c2f56287-9a3e-454a-9ec1-53184b788db9¹depends_on_disabled_cellsÂ§runtimeÏ4¾Hmµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$18e60b1d-97ec-432c-a388-003e7fae415fŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ7bellman_optimal_value! (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚšH€(°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$18e60b1d-97ec-432c-a388-003e7fae415f¹depends_on_disabled_cellsÂ§runtimeÎY"øµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$12c5efe4-d64d-4b82-877c-29b0e537fee6Š¦queuedÂ¤logs§runningÂ¦output†¤bodyƒ¨elements”’’…¦prefix¥Int64¨elements›’’¡3ªtext/plain’’¡2ªtext/plain’’¡3ªtext/plain’’¡4ªtext/plain’’¡3ªtext/plain’’¡4ªtext/plain’’¡3ªtext/plain’’¡4ªtext/plain’ ’¡3ªtext/plain¤more’’¡1ªtext/plain¤type¥Array¬prefix_short ¨objectid°3a70a1eb8ec67a61Ù!application/vnd.pluto.tree+object’’…¦prefix¥Int64¨elements›’’¡1ªtext/plain’’¡1ªtext/plain’’¡1ªtext/plain’’¡1ªtext/plain’’¡1ªtext/plain’’¡1ªtext/plain’’¡1ªtext/plain’’¡1ªtext/plain’ ’¡1ªtext/plain¤more’’¡1ªtext/plain¤type¥Array¬prefix_short ¨objectid°fd5795729494610cÙ!application/vnd.pluto.tree+object’’…¦prefix§Float32¨elements›’’£0.0ªtext/plain’’£0.0ªtext/plain’’£0.0ªtext/plain’’£0.0ªtext/plain’’£0.0ªtext/plain’’£0.0ªtext/plain’’£0.0ªtext/plain’’£0.0ªtext/plain’ ’£0.0ªtext/plain¤more’’£0.0ªtext/plain¤type¥Array¬prefix_short ¨objectid°17b12935c947880aÙ!application/vnd.pluto.tree+object’’¡0ªtext/plain¤type¥Tuple¨objectid°e1b5fb187f6eb83d¤mimeÙ!application/vnd.pluto.tree+object¬rootassigneeÀ²last_run_timestampËAÚ•å¦«†°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$12c5efe4-d64d-4b82-877c-29b0e537fee6¹depends_on_disabled_cellsÂ§runtimeÍdøµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$a72d07bf-e337-4bd4-af5c-44d74d163b6bŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÚj„ ¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•æÁæ°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$a72d07bf-e337-4bd4-af5c-44d74d163b6b¹depends_on_disabled_cellsÂ§runtimeÎp·þµpublished_object_keys‘Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/c69864c8f78f9c34¸depends_on_skipped_cellsÂ§erroredÂÙ$0201ae9f-4a31-497e-86ab-62b454ca85deŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÚ

Notice that about about $\alpha = 0.25$, Q-learning sometimes has diverging values and therefore episodes that avoid termination whereas Double Q-learning avoids that problem even at large learning rates.

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ¾£M°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$0201ae9f-4a31-497e-86ab-62b454ca85de¹depends_on_disabled_cellsÂ§runtimeÎ0Äµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$b37f2395-1480-4c7c-b6c0-eba391e969d7Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚ h

Let's first consider the problem of prediction problem for afterstates and see how to compute the afterstate value function and how it could be used for policy improvement. We will use the terminology $W(y)$ to represent the value of afterstate $y$ while $V(s)$ still means the value of state $s$. From the earlier definitions, we can show the relationship between the state and afterstate value functions.

Recall that:

$$\begin{flalign} G_t &\doteq R_t + \gamma R_{t+1} + \cdots \\ V_\pi(s) &\doteq \mathbb{E}_\pi[G_t \mid S_t = s] \\ & = \mathbb{E}_\pi[R_t + \gamma V_\pi(S_{t+1}) \mid S_t = s] \\ &= \sum_a \pi(a \vert s) \sum_{r, s^\prime} p(r, s^\prime \vert s, a) \left ( r + \gamma V(s^\prime) \right ) \end{flalign}$$

Representing the trajectory with afterstates and only considering the reward following an afterstate, we also know that:

$$\begin{flalign} G_t &\doteq R_t + \gamma(P_{t+1} + R_{t+1} + \gamma(P_{t+2} + R_{t+1} + \cdots))\\ W_\pi(y) &\doteq \mathbb{E}_\pi[G_t \mid Y_t = y] \\ & = \mathbb{E}_\pi[R_t + \gamma \left (P_{t+1} + W_\pi(Y_{t+1}) \right ) \mid Y_t = y] \\ &= \sum_{r, s^\prime} p(r, s^\prime \vert y) \left [r + \gamma \sum_{a^\prime} \left [ \pi(a \vert s^\prime) \left ( f_2(s^\prime, a^\prime) + W_\pi(f_1(s^\prime, a^\prime) \right ) \right ] \right ] \end{flalign}$$

Notice that compared to the value function, the policy only matters for this expected value when we consider the action taken from the transition state. The initial transition from the afterstate to $s^\prime$ only depends on our new transition function which only conditioned on the afterstate.

Recall that to improve a policy $\pi$ for which we have a value function $V_\pi$, we must select the greedy policy with respect to $V_\pi$ meaning $\pi^{\prime} (s) = \mathrm{argmax}_a \sum_{r, s^\prime} p(r, s^\prime \vert s, a)(r + \gamma V(s^\prime))$. If we do have access to the full probability transition function, we cannot compute this explicitely. Furthermore, we cannot estimate this either from a single trajectory because from each state we would just have a single transition based on the behavior policy at the time. That's why for MDPs that do not provide the full transition function, we prefer to estimate the state action value function $Q(s, a)$ because using that function policy improvement is much more trivial: $\pi^{\prime} (s) = \mathrm{argmax}_a Q(s, a)$.

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ¿Nß°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$b37f2395-1480-4c7c-b6c0-eba391e969d7¹depends_on_disabled_cellsÂ§runtimeÎÔ µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$6edb550d-5c9f-4ea6-8746-6632806df11eŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÚj„ ¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•äÒ„{°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$6edb550d-5c9f-4ea6-8746-6632806df11e¹depends_on_disabled_cellsÂ§runtimeÎÉ`ø!µpublished_object_keys‘Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/6eecf72f2f10b69c¸depends_on_skipped_cellsÂ§erroredÂÙ$01582b3b-c4d0-4691-9edf-f77e6d8be2c9Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ\

Maximization Bias Visualization for a Single Estimator

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ¼–¯°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$01582b3b-c4d0-4691-9edf-f77e6d8be2c9¹depends_on_disabled_cellsÂ§runtimeÎÜnµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$7ed07ddc-1c63-4ce7-bfd3-6da54304d297Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ4makepolicyvaluemaps (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚš‡rT°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$7ed07ddc-1c63-4ce7-bfd3-6da54304d297¹depends_on_disabled_cellsÂ§runtimeÎ,ó^µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$4862942b-d1e2-4ac8-8e88-65205e91a070Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚ(


Maximum Number of Variables:
Maxinum Number of Samples Per Variable:
Number of Runs:

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚšck¾°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$4862942b-d1e2-4ac8-8e88-65205e91a070¹depends_on_disabled_cellsÂ§runtimeÎ ÿ„kµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$a5009785-64b4-489b-a967-f7840b4a9463Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙD

Random Walk Visualization Code

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ·`-°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$a5009785-64b4-489b-a967-f7840b4a9463¹depends_on_disabled_cellsÂ§runtimeÎßµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$eb735ead-978b-409c-8990-b5fa7a027ebfŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ3tabular_TD0_pred_V (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•â;Ä‚°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$eb735ead-978b-409c-8990-b5fa7a027ebf¹depends_on_disabled_cellsÂ§runtimeÎ¡$µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$2034fd1e-5171-4eda-85d5-2de62d7a1e8bŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ+q_learning (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•èæî³°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$2034fd1e-5171-4eda-85d5-2de62d7a1e8b¹depends_on_disabled_cellsÂ§runtimeÎêµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$4382928c-6325-4ecd-b7cf-282525a270abŠ¦queuedÂ¤logs§runningÂ¦output†¤body ¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚš›Z°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$4382928c-6325-4ecd-b7cf-282525a270ab¹depends_on_disabled_cellsÂ§runtimeÎ"—ûµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$8bc54c94-9c92-4904-b3a6-13ff3f0110bbŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ0show_grid_value (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•èØü°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$8bc54c94-9c92-4904-b3a6-13ff3f0110bb¹depends_on_disabled_cellsÂ§runtimeÎi‹Ÿµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$4b1a4c14-3c2b-40c0-995c-cd0334ed8b3aŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ4

Normal Actions

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ»Rã°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$4b1a4c14-3c2b-40c0-995c-cd0334ed8b3a¹depends_on_disabled_cellsÂ§runtimeÎæ‘µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$f0f9d3d5-e76a-4472-bfb1-da29d73a7916Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÛGì

Sarsa Solution

Actions

Wind Values

-7.3

-7.6

-7.5

-8.8

-8.6

-9.1

-9.2

-7.0

-6.2

-7.8

-8.2

-8.4

-8.8

-9.2

-5.0

-6.2

-6.5

-7.7

-8.3

-8.8

-9.5

-4.0

-4.1

-5.4

-7.9

-8.5

-9.1

-9.9

-3.4

-3.0

-3.4

-7.6

-8.2

-8.7

-10.0

-1.8

-2.0

-2.1

-7.1

-7.8

-8.3

-9.3

-0.94

-1.0

-6.6

-7.1

-7.9

-8.4

0.0

-1.0

0.0

-5.8

-6.4

-7.3

-0.5

-0.98

-1.0

-4.0

-5.2

-5.9

-0.75

-0.88

-1.7

-2.0

-2.2

-3.6

-4.7

Actions

Wind Values

Value Iteration Solution

Actions

Wind Values

-7.0

-8.0

-6.0

-7.0

-8.0

-5.0

-6.0

-7.0

-8.0

-4.0

-6.0

-7.0

-8.0

-9.0

-3.0

-7.0

-8.0

-9.0

-2.0

-7.0

-8.0

-1.0

-6.0

-7.0

-1.0

-2.0

-1.0

0.0

-5.0

-6.0

-2.0

-1.0

-3.0

-4.0

-5.0

-2.0

-3.0

-4.0

Actions

Wind Values

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚš%èW°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$f0f9d3d5-e76a-4472-bfb1-da29d73a7916¹depends_on_disabled_cellsÂ§runtimeÎœ>µpublished_object_keys”Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/4e7985c38cb01320Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/2933a969c3841bd1Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/7c2857752627f863Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/bf44e09ac1fcc101¸depends_on_skipped_cellsÂ§erroredÂÙ$4c1b286c-2ba9-4293-81e1-bf360baa75faŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÚ¬

The following argument is taken from "Double Q-learning" by Hado van Hasselt published in Advances in Neural Information Processing Systems 23 (NIPS 2010):

Consider a set of $M$ random variables $X=\{X_1, \dots, X_M\}$. We would like to calculate:

$$\max_i \mathbb{E} \{X_i\} \tag{a}$$

Without any knowledge of the underlying distribution of each $X_i$ it is impossible to determine $(\star)$ exactly. Most often we would approximate it by first constructing approximations for $\mathbb{E} \{ X_i \} \: \forall \: i$. Let $S = \bigcup_{i=1}^M S_i$ denote the set of samples where $S_i$ is the subset containing samples for the variable $X_i$. We assume that the samples in $S_i$ are independent and identically distributed (iid). Unbiased estimates for the expected values can be obtained by computing hte sample average for each variable: $\mathbb{E} \{ X_i \} = \mathbb{E} \{ \mu_i \} \approx \mu_i(S) \doteq \frac{1}{\vert S_i \vert } \sum_{s \in S_i} s$ where $\mu_i$ is an estimator for the variable $X_i$. This approximation is unbiased since very sample $s in S_i$ is an unbiased estimat for the value of $\mathbb{E} \{ X_i \}$. The error in approximation thus consists soley of the variance in the estimator and decreases when we obtain more samples. We use the following notations: $f_i$ denotes the probability density function (PDF) of the $i^{th}$ variable $X_i$ and $F_i(x) = \int_{-\infty}^{x} f_i(x)dx$ is the cumulative distribution function (CDF) of this PDF. Similarly, the PDF and CDF of the $i^{th}$ estimator are denoted $f_i^\mu$ and $F_i^\mu$. The maximum expected value cna be expressed in terms of the underlying PDFs as $\max_i \mathbb{E} \{ X_i \} = \max_i \int_{-\infty}^\infty x f_i(x)dx$.

An obvious way to approximate the value of $(a)$ is to use the value of the maximal estimator:

$$\max_i \mathbb{E} \{ X_i \} = \max_i \mathbb{E} \{ \mu_i \} \approx \max_i \mu_i(S) \tag{b}$$

and this is the estimator employed in ordinary Q-learning. This estimator is distributed according to some PDF $f_{max}^\mu$ that is dependent on the PDFs of the estimators $f_i^\mu$. To determine this PDF, consider the CDF $F_{\max}^\mu(x)$, which gives the probability that the maximum estimate is lower or equal to $x$. This probability is equal to the probability that all the estimates are lower or equal to $x: F_{\max}^\mu(x) \doteq P(\max_i \mu_i \leq x) = \prod_{i=1}^M P(\mu_i\leq x) \doteq \prod_{i=1}^M F_i ^\mu (x)$. The value $\max_i \mu_i(S)$ is an unbiased estimate for $\mathbb{E} \{ \max_j \mu_j \} = \int_{-\infty}^{\infty} x f_{\max}^\mu(x)dx$ which can thus be given by:

$$\mathbb{E} \{ \max_j \mu_j \} = \int_{-\infty}^{\infty} x \frac{d}{dx} \prod_{i=1}^M F_i ^ \mu (x) dx = \sum_{j=1}^M \int_{-\infty}^{\infty}x f_j ^ \mu (x) \prod_{i \neq j}^M F_i ^ \mu(x) dx \tag{c}$$

However in $(a)$ the order of the max operator and the expectation operator are the other way around. The following illustrates why $(c)$ has a positive bias.

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ½3 °persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$4c1b286c-2ba9-4293-81e1-bf360baa75fa¹depends_on_disabled_cellsÂ§runtimeÎ8âµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$3134e913-1e86-495d-a558-c3ec4828bf7bŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ9begin_value_iteration_v (generic function with 3 methods)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚšŸt*°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$3134e913-1e86-495d-a558-c3ec4828bf7b¹depends_on_disabled_cellsÂ§runtimeÎ"ƒùµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$db31579e-3e56-4271-8fc3-eb13bc95ac27Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙy

Adding the no-movement action doesn't seem to change the shortest path of 7 steps

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þºå?°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$db31579e-3e56-4271-8fc3-eb13bc95ac27¹depends_on_disabled_cellsÂ§runtimeÎÊµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$943b6d7e-14a4-4532-90c7-dd5080be0c6eŠ¦queuedÂ¤logs§runningÂ¦output†¤body…¦prefix§Float32¨elements’’’¤-1.2ªtext/plain’’£1.0ªtext/plain¤type¥Array¬prefix_short ¨objectid°dd6555714180979e¤mimeÙ!application/vnd.pluto.tree+object¬rootassignee³const noisy_rewards²last_run_timestampËAÚšª½°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$943b6d7e-14a4-4532-90c7-dd5080be0c6e¹depends_on_disabled_cellsÂ§runtimeÎ®*µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$84584793-8274-4aa1-854f-b167c7434548Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙMgridworld_Q_vs_sarsa_vs_expected_sarsa_solve (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚš<û/°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$84584793-8274-4aa1-854f-b167c7434548¹depends_on_disabled_cellsÂ§runtimeÎÎXYµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$9f28772c-9afe-4253-ab3b-055b0f48be6eŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ*plot_path (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•è&°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$9f28772c-9afe-4253-ab3b-055b0f48be6e¹depends_on_disabled_cellsÂ§runtimeÎžã_µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$1dd1ba55-548a-41f6-903e-70742fd60e3dŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÙP ¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•æ!W°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$1dd1ba55-548a-41f6-903e-70742fd60e3d¹depends_on_disabled_cellsÂ§runtimeÍNŒµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$2a3e4617-efbb-4bbc-9c61-8535628e439cŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÚ &

Exercise 6.12

Supposed action selection is greedy. Is Q-learning then exactly the same algorithm as Sarsa? Will they make exactly the same action selections and weight updates?

Consider both updates when the greedy policy is followed during training.

Sarsa Update:

$$Q_\pi(S_t, A_t) = \text{E}_\pi [R_{t+1} + \gamma Q_\pi(S_{t+1}, A_{t+1})]$$

with $A_{t+1}$ chosen by the greedy policy accoring to $\text{max}_a Q_\pi(S_{t+1})$ for the estimates prior to this update.

Q-Learning Update:

$$Q_\pi(S_t, A_t) = \text{E}_\pi [R_{t+1} + \gamma \text{max}_a Q_\pi(S_{t+1}, a)]$$

The value updates are identical since the Q estimate used in both cases will be based on the maximizing action at state $S_{t+1}$. In the case of Sarsa, $A_{t+1}$ has already been selected prior to this update occurring, so this value update will properly reflect the next step in the trajectory. In Q-learning, the action selection at $S_{t+1}$ will occur after the update step. Notice that we only updated $Q_\pi(S_t, A_t)$ and did not touch $Q_\pi(S_{t+1}, A_{t+1})$, so our next action selection should be unaffected by this update. However, there in one exception for the case where the state is identical through the transition: $S_t = S_{t+1}$. In this case, the update could actually affect the next action selection, for example, let's say a very low reward was received during the update. That would lower the estimate for this action selected on step t and it may no longer be maximizing on step t+1. Then Sarsa would have chosen the same action ahead of the update but Q-learning would chose a different action on the next step even though the state is unchanged. Despite this difference, both methods are still computing the state-action value function for the optimal policy, but neither is guaranteed to converge to this function due to the violation of the assumption that all state-action pairs are visited during training.

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ»óE°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$2a3e4617-efbb-4bbc-9c61-8535628e439c¹depends_on_disabled_cellsÂ§runtimeÎ~@µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$5f32fed0-c921-4cbb-85fe-ade54d4c6c95Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚ

At each state or checkpoint you try to predict how much longer it will take to get home using any information that is relevant. Notice that regardless of how inaccurate we were on previous steps, we can still make an accurate prediction for the time to go.

State	Elapsed Time (minutes)	Predicted Time to Go	Predicted Total Time
leaving office, friday at 6	0	30	30
reach car, raining	5	35	40
exiting highway	20	15	35
2ndary road, behind truck	30	10	40
entering home street	40	3	43
arriving home	43	0	43

The rewards in this example are the elapsed times on each leg of the journey and there is no discounting, thus the return for each state is the actual time to go from that state. The value of each state is the expected time to go. The second column of numbers gives the current estimated value for the state encountered.

A simple way to view the operation of Mone Carlo methods is to plot hte predicted total time (the last column) over the sequence. For each state we would compare that value with the actual elapsed time which was 43 minutes.

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ¶È°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$5f32fed0-c921-4cbb-85fe-ade54d4c6c95¹depends_on_disabled_cellsÂ§runtimeÎôVµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$a3d10753-2ec3-4252-9629-834145678b6aŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ?

Afterstate Implementation

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ¿•Ñ°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$a3d10753-2ec3-4252-9629-834145678b6a¹depends_on_disabled_cellsÂ§runtimeÎè_µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$12aac612-758b-4655-8ede-daddd4af6d3eŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ+sarsa_step (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•çê!Ÿ°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$12aac612-758b-4655-8ede-daddd4af6d3e¹depends_on_disabled_cellsÂ§runtimeÎµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$2c49900b-3c57-4d9a-b3dc-ef9cc20c30c1Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚê

To understand the origin of the bias, consider a case where we only have a single sample from each variable which follows a standard normal distribution. In this case our estimate of the maximum expected value is just $\max(x, y)$ where $x$ and $y$ are samples from $X$ and $Y$ respectively. The expected value of this estimator can be calculated using the distribution of the maximum of two standard normal random variables:

$$\mathbb{E}\left [ \text{max}(\mathcal{N}(0, 1), \mathcal{N}(0, 1)) \right ] = \frac{1}{\sqrt{\pi}} \approx 0.564$$

Indeed, on the plot for 2 variables after 1 sample collected for each, this average observed value is 0.56 and the value increase the more variables in our list. So apparantly our estimate has a positive bias despite the fact that every underlying variables have exactly the same distribution. If we had more samples for each variable then we would use the distribution of the sample average rather than a single sample and that distribution has a variance proportional to the inverse of the number of samples. So the bias will converge to zero in the limit of infinite samples, and in the graph the bias does in fact converge to zero over more samples.

There is a method of eliminating this positive bias using a so-called double estimator, and this method was first introduced by Hado van Hasselt in a paper published during NIPS 2010. Below is a more thorough overview of the paper, but first I will provide a conceptual sketch of the proof.

First consider a set of $M$ random variables $X = \{X_1, \dots, X_M \}$ and our goal is to estimate: $\max_i \mathbb{E} \{ X_i \}$.

In the single estimator case, we will draw samples from each variable and construct some unbiased estimator for each mean: $\mu_i$. After we have collected some set of samples, using this method, we make the assumption that which ever estimator or set of estimators have the maximum value are the true variables with the maximum expected value. If there is zero overlap in the distribution of each random variable, then these estimators will always be ranked in the same order as the true expected values and our estimate will be unbiased. However, if there is any overlap in the underlying distributions (this also includes the case where all distributions are identical), then there is some non-zero probability that the true maximum index is NOT in the set of indices for the maximum estimators. Let's say the apparent maximizing index from the sample is $s^*$ while one of the true maximizing indices is $j \neq s^*$. So our final estimate for the maximum expected value will be $\mu_{s^*}$. We already know that $\mathbb{E} \{ X_j \} = \max_i \mathbb{E} \{X_i \}$ by assumption. We also know that $\mu_{s^*} > \mu_j$ in the sample and $\mathbb{E} \{ \mu_j\} = \max_i \mathbb{E} \{X_i \}$ which is the true value that we want. So we would always expect this estimator to be larger than the true answer or equal to it in the case where the selected index is correct. This is even true if all the variables share the same distribution, because every estimate has the same expected value which is the true answer, yet the one estimate we use to calculate the maximum is guaranteed to be larger than all of those unbiased alternatives. The underlying reason why this will tend to overestimate is because in any finite sample, we are not guaranteed to know the correct maximizing index and any variable that produces samples high enough to exceed the true maximum will always be selected to represent that maximum.

In the double estimator case, we split the samples into two sets $\mathcal{A}$ and $\mathcal{B}$ such that $\mathcal{A} \bigcap \mathcal{B} = \emptyset$ and have a set of estimators for each set $\mu_i^\mathcal{A}$ and $\mu_i^\mathcal{B}$. Let $a^*$ be in the set of indices with the maximum estimated values in set $\mathcal{A}$. Again, if the underlying distributions overlap at all, then there is some probability that this index is not in the set of true maximizing indices. However, now if all the distributions are equal, then whichever index we pick is still guaranteed to be correct. To estimate the actual value of the maximum, we take $\mu_{i_{a*}}^\mathcal{B}$ which is the estimate from set $\mathcal{B}$ at the maximizing index from set $\mathcal{A}$. Just like in the single estimator case, if this happens to be a correct index, then we have an unbiased estimate for the true value. However, if the index is wrong, we are estimating the expected value of a non-maximizing index from a new set of samples. By the definition of the maximizing indices, we know that in this case $\mathbb{E} \{ \mu_{a^*}^\mathcal{B} \} \lt \max_i \mathbb{E} \{ X_i \}$ resulting in a negative bias for our estimate. Just like in the single estimator case, this estimate will be unbiased if there is no overlap in the underlying probability distributions for each variable. Unlike the single estimator case, this estimate will also be unbiased if all the underlying distributions are equal.

See below for a visualization of the bias removal for the iid case as well as the more formal proof for both methods.

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ¼å.°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$2c49900b-3c57-4d9a-b3dc-ef9cc20c30c1¹depends_on_disabled_cellsÂ§runtimeÎ²Gµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$e26f788e-f602-403e-929e-6c98a6e6bf79Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚí

The double estimator methods are the only ones that don't show an initial increase in the number of episodes. After enough time though, every methodstarts to converge to the policy that takes a direct path. If $\alpha$ is not low enough, Q-learning fails to converge towards the optimal policy and has diverging value estimates. Both double methods are very stable and correctly estimate every state to have a negative value.

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ¾ˆQ°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$e26f788e-f602-403e-929e-6c98a6e6bf79¹depends_on_disabled_cellsÂ§runtimeÎg5µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$c09530bc-f37e-4d57-a267-14d4027147daŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÚ :

Returning to the definition of $\eta_t$, we can simplify further:

$$\eta_{t} \doteq V_{t+1}(S_{t+1}) - V_t(S_{t+1})$$

This quantity is the change in value estimate at a state between two time steps. Note that at time $t+1$ we have only performed an update for the value at state $S_t$ using the equation:

$$V_{t+1}(S_t) = V_t(S_t) + \alpha \delta_t$$

If $S_{t+1} \neq S_t$, then the value estimate at this state will not occur on either time step $t$ or $t+1$, so $V_{t+1}(S_{t+1}) = V_t(S_{t+1}) \implies \eta_{t} = 0$

The only case in which $V_{t+1}(S_{t+1}) \neq V_t(S_{t+1})$ is when $S_t = S_{t+1} = S$. In this case, $V_{t+1}(S) = V_t(S) + \alpha \delta_t \implies V_{t+1}(S) - V_t(S) = \alpha \delta_t$

So we can rewrite $\eta_{t} = \alpha \delta_t \mathbb{1}_{t}$ where $\mathbb{1}_{t} = \begin{cases} 1 & \text{if } S_{t+1} = S_t \\ 0 & \text{otherwise} \end{cases}$

So the original equation can be written as:

$$\begin{flalign} G_t - V_t(S_t) &= \sum_{k=t}^{T-1} \gamma^{k-t} (\delta_k + \gamma \alpha \delta_k \mathbb{1}_k) \\ &= \sum_{k=t}^{T-1} \gamma^{k-t} \delta_k (1 + \gamma \alpha \mathbb{1}_k) \\ \end{flalign}$$

Where the first term is the value from the original derivation and the second term is only non-zero when a state appears twice concecutively in an episode.

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ±¯W°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$c09530bc-f37e-4d57-a267-14d4027147da¹depends_on_disabled_cellsÂ§runtimeÎ“¦µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$0c0b875e-69f8-46ed-ad06-df9c36088fbeŠ¦queuedÂ¤logs§runningÂ¦output†¤body¡3¤mimeªtext/plain¬rootassignee®const gridsize²last_run_timestampËAÚš -Ã°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$0c0b875e-69f8-46ed-ad06-df9c36088fbe¹depends_on_disabled_cellsÂ§runtimeÎÝaµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$8d05403a-adeb-40ac-a98a-87586d5a5170Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙB

Example 6.5: Windy Gridworld

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þº˜p°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$8d05403a-adeb-40ac-a98a-87586d5a5170¹depends_on_disabled_cellsÂ§runtimeÎÜPµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$44c49006-e210-4f97-916e-fe62f36c593fŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÚÛ

6.5 Q-learning: Off-policy TD Control

One of the early breakthroughs in reinforcement learning was the development of an off-policy TD control algorithm known as Q-learning (Watkins, 1989), defined by

$$Q(S_t, A_t) \leftarrow Q(S_t, A_t) + \alpha [R_{t+1} + \gamma \text{max}_a Q(S_{t+1}, a) - Q(S_t, A_t)]$$

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ»\°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$44c49006-e210-4f97-916e-fe62f36c593f¹depends_on_disabled_cellsÂ§runtimeÎ†Oµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$0ad739c9-8aca-4b82-bf20-c73584d29535Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚ²

Exercise 6.9 Windy Gridworld with King's Moves (programming)

Re-solve the windy gridworld assuming eight possible actions, including the diagonal moves, rather than four. How much better can you do with the extra actions? Can you do even better by including a ninth action that causes no movement at all other than that caused by the wind?

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þº°¯°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$0ad739c9-8aca-4b82-bf20-c73584d29535¹depends_on_disabled_cellsÂ§runtimeÎÆQµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$0748902c-ffc0-4634-9a1b-e642b3dfb77bŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ3form_random_policy (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚšWUC°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$0748902c-ffc0-4634-9a1b-e642b3dfb77b¹depends_on_disabled_cellsÂ§runtimeÎ €µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$6a1503c6-c77b-4e3a-9f07-74b2af1a5ff7Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ:

Sarsa Implementation

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þº~°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$6a1503c6-c77b-4e3a-9f07-74b2af1a5ff7¹depends_on_disabled_cellsÂ§runtimeÎæÍµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$292d9018-b550-4278-a8e0-78dd6a6853f1Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ/expected_sarsa (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•éàs¸°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$292d9018-b550-4278-a8e0-78dd6a6853f1¹depends_on_disabled_cellsÂ§runtimeÎ´ñ&µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$07c57f37-22be-4c39-8279-d80addcea0c5Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ@create_stochastic_gridworld_mdp (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚš9˜°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$07c57f37-22be-4c39-8279-d80addcea0c5¹depends_on_disabled_cellsÂ§runtimeÎR4…µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$b5187232-d808-49b6-9f7e-a4cbeb6c2b3eŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ?

Example 6.1: Driving Home

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ±Ê°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$b5187232-d808-49b6-9f7e-a4cbeb6c2b3e¹depends_on_disabled_cellsÂ§runtimeÎóµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$54d97122-2d01-46ec-aafe-00bfc9f2d6d1Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ0

Step: 1 / 17

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•å´Qƒ°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$54d97122-2d01-46ec-aafe-00bfc9f2d6d1¹depends_on_disabled_cellsÂ§runtimeÎ2hsµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$926ec37d-b969-4dc9-99b2-a6b29c6d880cŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ1

Figure 6.5:

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ¾®°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$926ec37d-b969-4dc9-99b2-a6b29c6d880c¹depends_on_disabled_cellsÂ§runtimeÎéOµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$c360945e-f8b2-4c6f-a70c-6ab4ddcf5b54Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙØ

By changing the initialization to 0, the RMS error monotonically converges to the minimum since the state values never pass through the correct values on their way to overshooting.

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ¸°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$c360945e-f8b2-4c6f-a70c-6ab4ddcf5b54¹depends_on_disabled_cellsÂ§runtimeÎ ¢µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$573a9919-bd7e-4a56-b830-4e40e91288efŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÚ

Let $X = \{ X_1, \dots, X_M \}$ be a set of random variables and let $\mu^A = \{\mu_1^A, \dots, \mu_M^A \}$ and $\mu^B = \{\mu_1^B, \dots, \mu_M^B\}$ be two sets of unbiased estimators such that $\mathbb{E} \{ \mu_i^A \} = \mathbb{E} \{ \mu_i^B \} = \mathbb{E} \{ X_i \}$ for all $i$. Let $\mathcal{M} \doteq \left \{ j \mid \mathbb{E} \{ X_j \} = \max_i \mathbb{E} \{ X_i \} \right \}$ be the set of labels of estimators that maximize the expcted values of $X$. Let $a^*$ be an element that maximizes $\mu^A:\mu_{a^*}^A = \max_i \mu_i^A$. The claim is that:

$$\mathbb{E} \{ \mu_{a^*}^B \} = \mathbb{E} \{ X_{a^*} \} \leq \max_i \mathbb{E} \{ X_i \}$$

. Furthermore, the inequality is strict if and only if $P(a^* \notin \mathcal{M}) \gt 0$.

Proof. Assume $a^* \in \mathcal{M}$. Then $\mathbb{E} \{ \mu_{a^*}^B\} = \mathbb{E} \{ X_{a^*}\} \doteq \max_i \mathbb{E} \{ X_i \}$. Now assume $a^* \notin \mathcal{M}$ and choose $j \in \mathcal{M}$. Then $\mathbb{E} \{ \mu_{a^*} \} = \mathbb{E} \{ X_{a^*}\} \lt \mathbb{E} \{ X_j \} \doteq \max_i \mathbb{E} \{ X_i \}$. These two possibilities are mutually exclusive, so the combined expression can be written as:

$$\begin{flalign} \mathbb{E} \{ \mu_{a^*}^B \} &= P(a^* \in \mathcal{M}) \mathbb{E} \{ \mu_{a^*}^B \vert a^* \in \mathcal{M} \} + P(a^* \notin \mathcal{M}) \mathbb{E} \{ \mu_{a^*}^B \vert a^* \notin \mathcal{M} \} \\ &= P(a^* \in \mathcal{M}) \max_i \mathbb{E} \{X_i \} + P(a^* \notin \mathcal{M}) \mathbb{E} \{ \mu_{a^*}^B \vert a^* \notin \mathcal{M} \} \\ &\leq P(a^* \in \mathcal{M}) \max_i \mathbb{E} \{X_i \} + P(a^* \notin \mathcal{M}) \max_i \mathbb{E} \{ X_i \} \\ &=\max_i \mathbb{E} \{ X_i \} \end{flalign}$$

The inequality is strict only if $P(a^* \notin \mathcal{M}) \gt 0$ where $\mathcal{M}$ is the true set of maximizing variables. This happens when variables have different expected values, but their distributions overlap. In contrast with the simple estimator, the double estimator is unbiased when the variables are iid, since then all expected values are equal and $P(a^* \in \mathcal{M}) = 1$.

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ½½^°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$573a9919-bd7e-4a56-b830-4e40e91288ef¹depends_on_disabled_cellsÂ§runtimeÎ6µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$4556cf44-4a1c-4ca4-bfb8-4841301a2ce6Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ4display_rook_policy (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•èÖ¡°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$4556cf44-4a1c-4ca4-bfb8-4841301a2ce6¹depends_on_disabled_cellsÂ§runtimeÎ¸Jµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$bb085f2e-83cb-45b2-adf6-c07da892d6e1Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚjÜ

Value Iteration Results for Jack's Car Rental

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚš!Ô°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$bb085f2e-83cb-45b2-adf6-c07da892d6e1¹depends_on_disabled_cellsÂ§runtimeÏQ™<µpublished_object_keys‘Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/ebe8d19277071b89¸depends_on_skipped_cellsÂ§erroredÂÙ$e9359ca3-4d11-4365-bc6e-7babc6fcc7deŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ&move (generic function with 9 methods)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•è4¸ñ°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$e9359ca3-4d11-4365-bc6e-7babc6fcc7de¹depends_on_disabled_cellsÂ§runtimeÎÐKµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$639840dc-976a-4e5c-987f-a92afb2d99d8Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚMŒ ¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•áEÚ°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$639840dc-976a-4e5c-987f-a92afb2d99d8¹depends_on_disabled_cellsÂ§runtimeÎ+Çgdµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$dd167494-99d6-45c6-99e4-c36fde5e2d3fŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ@

Jack's Car Rental Code

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•ÞÀ5v°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$dd167494-99d6-45c6-99e4-c36fde5e2d3f¹depends_on_disabled_cellsÂ§runtimeÎÜ2µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$ab331778-f892-4690-8bb3-26464e3fc05fŠ¦queuedÂ¤logs§runningÂ¦output†¤body…¦prefixÙÜMDP_TD{GridworldState, GridworldAction, var"#tr#115"{var"#109#118", var"#step#114"{typeof(apply_wind), Vector{Int64}, var"#boundstate#113"{Int64, Int64}}}, var"#108#117"{GridworldState}, var"#isterm#116"{GridworldState}}¨elements—’¦states’…¦prefixÙ$Main.var"workspace#3".GridworldState¨elements›’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°78e123e4d06443c5Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡2ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°e3e6b18864c38362Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡3ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°7d75a915b81b9730Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡4ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°32586272439d3588Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡5ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°593769200b7ddf14Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡6ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°d7705072ebc67732Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡7ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid¯32fa797472e0a83Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡2ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°ef30e57ae60bdc38Ù!application/vnd.pluto.tree+object’ ’…¦prefix®GridworldState¨elements’’¡x’¡2ªtext/plain’¡y’¡2ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°74f49756a2864a57Ù!application/vnd.pluto.tree+object¤more’F’…¦prefix®GridworldState¨elements’’¡x’¢10ªtext/plain’¡y’¡7ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°91d5970141de4b2dÙ!application/vnd.pluto.tree+object¤type¥Array¬prefix_short ¨objectid°ff3fbf77165dec32Ù!application/vnd.pluto.tree+object’«statelookup’…¦prefixÙ1Dict{Main.var"workspace#3".GridworldState, Int64}¨elements›’’…¦prefix®GridworldState¨elements’’¡x’¡8ªtext/plain’¡y’¡5ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°14e5eae9a48c6749Ù!application/vnd.pluto.tree+object’¢54ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡6ªtext/plain’¡y’¡4ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°4e5052ac2b36c8beÙ!application/vnd.pluto.tree+object’¢39ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡7ªtext/plain’¡y’¡4ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°6d65389daed97014Ù!application/vnd.pluto.tree+object’¢46ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡8ªtext/plain’¡y’¡4ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°b85af438304886c5Ù!application/vnd.pluto.tree+object’¢53ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¢10ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°dad6dff35c9621ffÙ!application/vnd.pluto.tree+object’¢64ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡6ªtext/plain’¡y’¡7ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°4e4b90239eb3be65Ù!application/vnd.pluto.tree+object’¢42ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡8ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°6d43cd1ca99a553eÙ!application/vnd.pluto.tree+object’¢50ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡2ªtext/plain’¡y’¡3ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°166e372c47e8ffa6Ù!application/vnd.pluto.tree+object’¢10ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡5ªtext/plain’¡y’¡3ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°f8402269233868c7Ù!application/vnd.pluto.tree+object’¢31ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡8ªtext/plain’¡y’¡7ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°b08053c76dcd8072Ù!application/vnd.pluto.tree+object’¢56ªtext/plain¤more¤type¤Dict¬prefix_short¤Dict¨objectid°7cd9c16284f4a833Ù!application/vnd.pluto.tree+object’§actions’…¦prefixÙ%Main.var"workspace#3".GridworldAction¨elements”’’…¦prefix¢Up¨elements¤type¦struct¬prefix_short¢Up¨objectid°ffffffff92511601Ù!application/vnd.pluto.tree+object’’…¦prefix¤Down¨elements¤type¦struct¬prefix_short¤Down¨objectid°ffffffff0a19a748Ù!application/vnd.pluto.tree+object’’…¦prefix¤Left¨elements¤type¦struct¬prefix_short¤Left¨objectid°ffffffff13951cc2Ù!application/vnd.pluto.tree+object’’…¦prefix¥Right¨elements¤type¦struct¬prefix_short¥Right¨objectid°ffffffffa8d2f4c6Ù!application/vnd.pluto.tree+object¤type¥Array¬prefix_short ¨objectid°952f6adeb23ade52Ù!application/vnd.pluto.tree+object’¬actionlookup’…¦prefixÙ2Dict{Main.var"workspace#3".GridworldAction, Int64}¨elements”’’…¦prefix¤Down¨elements¤type¦struct¬prefix_short¤Down¨objectid°ffffffff0a19a748Ù!application/vnd.pluto.tree+object’¡2ªtext/plain’’…¦prefix¤Left¨elements¤type¦struct¬prefix_short¤Left¨objectid°ffffffff13951cc2Ù!application/vnd.pluto.tree+object’¡3ªtext/plain’’…¦prefix¥Right¨elements¤type¦struct¬prefix_short¥Right¨objectid°ffffffffa8d2f4c6Ù!application/vnd.pluto.tree+object’¡4ªtext/plain’’…¦prefix¢Up¨elements¤type¦struct¬prefix_short¢Up¨objectid°ffffffff92511601Ù!application/vnd.pluto.tree+object’¡1ªtext/plain¤type¤Dict¬prefix_short¤Dict¨objectid°d959e68b9a201521Ù!application/vnd.pluto.tree+object’ªstate_init’Ù%#108 (generic function with 1 method)ªtext/plain’¤step’Ú(::Main.var"workspace#3".var"#tr#115"{Main.var"workspace#3".var"#109#118", Main.var"workspace#3".var"#step#114"{typeof(Main.var"workspace#3".apply_wind), Vector{Int64}, Main.var"workspace#3".var"#boundstate#113"{Int64, Int64}}}) (generic function with 1 method)ªtext/plain’¦isterm’Ùq(::Main.var"workspace#3".var"#isterm#116"{Main.var"workspace#3".GridworldState}) (generic function with 1 method)ªtext/plain¤type¦struct¬prefix_short¦MDP_TD¨objectid°72b262b1eeaeea6a¤mimeÙ!application/vnd.pluto.tree+object¬rootassigneeµconst windy_gridworld²last_run_timestampËAÚ•èy °persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$ab331778-f892-4690-8bb3-26464e3fc05f¹depends_on_disabled_cellsÂ§runtimeÎ!S€µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$0e59e813-3d48-4a24-b5b3-9a9de7c500c2Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚd

Exercise 6.7

Design an off-policy version of the TD(0) update that can be used with arbitrary target policy $\pi$ and convering behavior policy $b$, using each step $t$ the importance sampling ratio $\rho_{t:t}$ (5.3).

Recall that equation 5.3 defines:

$$\rho_{t:T-1} = \prod_{k=t}^{T-1}\frac{\pi(A_k|S_k)}{b(A_k|S_k)}$$

with the property that:

$$\mathbb{E}[\rho_{t:T-1}G_t \mid S_t = s] = v_\pi(s)$$

when $G_t$ is generated by the behavior policy.

The TD(0) update rule is given by:

$$V(S_t) \leftarrow V(S_t) + \alpha [R_{t+1} + \gamma V(S_{t+1}) - V(S_t)]$$

based on the following form of the Bellman equation:

$$v_\pi (s)=\text{E}_\pi[R_{t+1} + \gamma v_\pi(S_{t+1}) \mid S_t = s]$$

In the off-policy case, the reward $R_{t+1}$ and the subsequent state $S_{t+1}$ would be generated from the behavior policy, but the subsequent value would still be based on the target policy value function. Consider instead the quantity: $q_\pi(s, a) = \mathbb{E} [R_{t+1} + \gamma v_{\pi}(S_{t+1}) \mid S_t = s, A_t = a]$ where we have removed the policy from the expectation since nothing in the bracket depends on sampling from the policy. Even if we chose actions a based on a behavior policy that differs from the target policy, these estimates will be correct because we are directly calculating the value for choosing that action, regardless of what the probability is. Consier we are following some behavior policy $b$ and recall that:

$$\begin{flalign} v_\pi(s) &= \sum_a \pi(a \vert s) q_\pi (s, a) \\ &= \sum_a \pi(a \vert s) \mathbb{E} [R_{t+1} + \gamma v_{\pi}(S_{t+1}) \mid S_t = s, A_t = a]\\ &= \mathbb{E}_\pi [R_{t+1} + \gamma v_{\pi}(S_{t+1}) \mid S_t = s]\\ v_b(s) &= \sum_a b(a \vert s) q_\pi (s, a) \\ &= \sum_a b(a \vert s) \mathbb{E} [R_{t+1} + \gamma v_{\pi}(S_{t+1}) \mid S_t = s, A_t = a] \\ &= \mathbb{E}_b [R_{t+1} + \gamma v_{\pi}(S_{t+1}) \mid S_t = s]\\ \end{flalign}$$

In the TD(0) update we do not calculate this expected value directly but instead average samples together that are drawn from the target policy. This sampling will produce samples weighted by the target policy probabilities thus mimicking the expected value sum. If instead, our samples are drawn from the behavior policy, then the samples will mimic the behavior policy probability weights instead of the target policy. So in order to correctly calculate the expected value we must multiply each behavior policy sample by $\frac{\pi(a \vert s)}{b(a \vert s)} = \frac{\pi(A_t \vert S_t)}{b(A_t \vert S_t)} = \rho_{t:t}$ resulting in the following update rule:

$$V(S_t) \leftarrow V(S_t) + \alpha [\rho_{t:t} \left ( R_{t+1} + \gamma V(S_{t+1}) \right ) - V(S_t)]$$

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ¹Î}°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$0e59e813-3d48-4a24-b5b3-9a9de7c500c2¹depends_on_disabled_cellsÂ§runtimeÎ CIµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$e4c6456c-867d-4ade-a3c8-310c1e065f14Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚ@

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•åÍŒÁ°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$e4c6456c-867d-4ade-a3c8-310c1e065f14¹depends_on_disabled_cellsÂ§runtimeÍG”µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$3e767962-7339-4f35-a039-b5521a098ed5Š¦queuedÂ¤logs§runningÂ¦output†¤body ¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•â °persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$3e767962-7339-4f35-a039-b5521a098ed5¹depends_on_disabled_cellsÂ§runtimeÎ4ŸVµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$834e5810-77ea-4dfd-9f37-9d9dbf6585a4Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ+makelookup (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•âå!°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$834e5810-77ea-4dfd-9f37-9d9dbf6585a4¹depends_on_disabled_cellsÂ§runtimeÎkµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$667666b9-3ab6-4836-953d-9878208103c9Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÛ

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚšÃr“°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$667666b9-3ab6-4836-953d-9878208103c9¹depends_on_disabled_cellsÂ§runtimeÎg ,Uµpublished_object_keys•Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/f51c1fa00f167ddfÙ59c6be96e-38f7-11f0-2d30-a71f02755abc/b3ded7d596cbc23fÙ59c6be96e-38f7-11f0-2d30-a71f02755abc/a0944b0f6ba4cc1fÙ59c6be96e-38f7-11f0-2d30-a71f02755abc/ada388116d66970bÙ59c6be96e-38f7-11f0-2d30-a71f02755abc/5f08b9d1ec5530fd¸depends_on_skipped_cellsÂ§erroredÂÙ$87fadfc0-2cdb-4be2-81ad-e8fdeffb690cŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ/show_mrp_state (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•åýt°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$87fadfc0-2cdb-4be2-81ad-e8fdeffb690c¹depends_on_disabled_cellsÂ§runtimeÎd¨µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$4019c974-dcaa-46c8-ac90-e6566a376ea1Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ9begin_value_iteration_v (generic function with 2 methods)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚšŸ h°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$4019c974-dcaa-46c8-ac90-e6566a376ea1¹depends_on_disabled_cellsÂ§runtimeÎ#›µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$4d4577b5-3753-450d-a247-ebd8c3e8f799Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ8create_Ïµ_greedy_policy (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•çã²°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$4d4577b5-3753-450d-a247-ebd8c3e8f799¹depends_on_disabled_cellsÂ§runtimeÎ:}Úµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$e19db54c-4b3c-42d1-b016-9620daf89bfbŠ¦queuedÂ¤logs§runningÂ¦output†¤body…¦prefix¥Int64¨elementsš’’¡0ªtext/plain’’¡0ªtext/plain’’¡0ªtext/plain’’¡1ªtext/plain’’¡1ªtext/plain’’¡1ªtext/plain’’¡2ªtext/plain’’¡2ªtext/plain’ ’¡1ªtext/plain’ ’¡0ªtext/plain¤type¥Array¬prefix_short ¨objectid°eb5c5c565e9477dc¤mimeÙ!application/vnd.pluto.tree+object¬rootassigneeÀ²last_run_timestampËAÚ•çý]}°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$e19db54c-4b3c-42d1-b016-9620daf89bfb¹depends_on_disabled_cellsÂ§runtimeÎV¬µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$ed4e863b-22dd-4d2b-88d0-b3a56d6713b7Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÛGe

Sarsa Solution

Actions

Wind Values

-22.0

-23.0

-24.0

-23.0

-24.0

-22.0

-20.0

-22.0

-23.0

-24.0

-17.0

-18.0

-20.0

-23.0

-24.0

-19.0

-22.0

-23.0

-24.0

-8.8

-12.0

-21.0

-22.0

-23.0

-22.0

-6.7

-11.0

-21.0

-22.0

-19.0

-15.0

-7.6

-21.0

-20.0

-19.0

-17.0

-10.0

-6.1

-8.4

0.0

-16.0

-18.0

-20.0

-18.0

-13.0

-1.5

-8.4

-9.2

-19.0

-11.0

-18.0

-3.3

-4.9

-8.3

-11.0

Actions

Wind Values

Value Iteration Solution

Actions

Wind Values

-11.0

-12.0

-13.0

-14.0

-10.0

-11.0

-12.0

-13.0

-14.0

-9.0

-9.5

-11.0

-12.0

-13.0

-14.0

-8.0

-8.5

-9.6

-11.0

-12.0

-13.0

-14.0

-6.8

-7.3

-8.5

-10.0

-12.0

-13.0

-5.6

-6.2

-7.1

-9.2

-11.0

-12.0

-4.6

-6.2

-7.5

-11.0

-5.6

-4.6

-6.2

0.0

-9.5

-10.0

-4.6

-6.2

-7.5

-8.6

-9.3

-5.6

-6.6

-7.6

-8.6

Actions

Wind Values

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚšPf^°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$ed4e863b-22dd-4d2b-88d0-b3a56d6713b7¹depends_on_disabled_cellsÂ§runtimeÎàëXµpublished_object_keys”Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/1b9ce98558a73749Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/5e790add5f7b1844Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/7b6adbf2145966c9Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/76a25ffbba40a531¸depends_on_skipped_cellsÂ§erroredÂÙ$393cd9d2-dd97-496e-b260-ec6e8b1c13b5Š¦queuedÂ¤logs§runningÂ¦output†¤body³FiniteAfterstateMDP¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚšAû¹°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$393cd9d2-dd97-496e-b260-ec6e8b1c13b5¹depends_on_disabled_cellsÂ§runtimeÎRî+µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$401831c3-3925-465c-a093-28686f0dad2eŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ7initialize_state_value (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•âwÄ°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$401831c3-3925-465c-a093-28686f0dad2e¹depends_on_disabled_cellsÂ§runtimeÎ#µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$2d881aa9-1da3-4d1e-8d05-245956dbaf33Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚú ¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•èÑ[K°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$2d881aa9-1da3-4d1e-8d05-245956dbaf33¹depends_on_disabled_cellsÂ§runtimeÍ,uµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$047a8881-c2ec-4dd1-8778-e3acf9beba2eŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÙp

Sarsa vs Q-learning vs Expected Sarsa Performance on Cliff Walking Example

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ¼9i°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$047a8881-c2ec-4dd1-8778-e3acf9beba2e¹depends_on_disabled_cellsÂ§runtimeÎôMµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$29b0a2d5-9629-46cd-b57c-6f3ef797de66Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚÊ

6.7 Maximization Bias and Double Learning

All the control algorithms that we have discussed so far involve maximization in the construction of the target policies. For example, in Q-learning the target policy is the greedy policy given the current action values, which is defined with a max, and in Sarsa the policy is often $\epsilon$-greedy, which also involves a maximization operation. In these algorithms, a maximum over estimated values is used implicitely as an estimate of the maximum value, which can lead to significant positive bias. To see why, consider a isngle state $s$ where there are many actions $a$ whose true values $q(s, a)$, are all zero, but whose estimated values, $Q(s, a)$, are uncertain and thus distributed above and some below zero. The maximum of the true values is zero, but the maximum of the estimates is positive, a positive bias. We call this maximization bias.

To elaborate on the bias, consider just two random variables $X \sim \mathcal{N}(\theta_1, 1)$ and $Y \sim \mathcal{N}(\theta_2, 1)$. We would like to estimate $\text{max} \left ( \mathbb{E}[X], \mathbb{E}[Y] \right ) = \text{max}(\theta_1, \theta_2)$ and using the approach analogous to our learning algorithms we would calculate $\max(\overline{X}, \overline{Y}) = \text{max} \left ( \sum_{i=1}^N \frac{x_i}{N}, \sum_{i=1}^M \frac{y_i}{M} \right )$. The problem with this approach is that for small numbers of samples, the variance each estimator is high and we are using this estimator both to select which random variable has the higher expected value and what that value is. Empirically, this results in a positive bias which gets worse the more variables we are considering as illustrated in the plot below.

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ¼| °persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$29b0a2d5-9629-46cd-b57c-6f3ef797de66¹depends_on_disabled_cellsÂ§runtimeÎøÁµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$c1d6532c-38a4-488f-9789-07d63fe6f125Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ˜

Load Existing File if Present:

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚšN+°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$c1d6532c-38a4-488f-9789-07d63fe6f125¹depends_on_disabled_cellsÂ§runtimeÎPµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$e6672866-c0a0-46f2-bb52-25fcc3352645Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚ‹

Exercise 6.5

In the right graph of the random walk example, the RMS error of the TD method seems to go down and then up again, particularly at high $\alpha$â€™s. What could have caused this? Do you think this always occurs, or might it be a function of how the approximate value function was initialized?

Since the value function was initialized at the correct value for the center state, all of the values to the right must be increased and the values to the left must be decreased to reach the true values. Episodes that terminate to the right will receive a reward of 1 and push up the rightmost estimate while episodes that terminate to the left will receive a reward of 0 and decrease the leftmost estimate. The correct value for each of these estimates is $\frac{1}{6}$ and $\frac{5}{6}$ respectively. Since there is an equal probability of exiting the walk on the right or the left, both ends of the value estimates will be updated at roughly the same rate. That means that both ends of the chain will move towards the correct value at about the same time and if those updates stay someone synchronized, all of the states will move through correct values at a similar time. At the time when the values are roughly accurate, what happens if $\alpha=0.15$? In this case, consider an update for state E assuming the estimate is already the correct value. $V(E) \leftarrow \frac{5}{6} + 0.15[1 - \frac{5}{6}] \approx 0.858 \gt \frac{5}{6}$. A similar effect happens with state A pushing it below the correct value. The larger $\alpha$ is, the more over-correction we have on future transitions and the feedback from the other neighboring states won't be enough to bring it back to the correct value. Since we pass through or very close to the correct value on the way, we pass through a minimum error value before over or undershooting the value estimate.

If we had instead initialized the state values at 0, then the estimate at A would already be too low and would not get corrected until information from the right side propagated through. State E, however, will receive large updates for each episode that exits to the right, but the values for the states to its left will be too low. Since the state value estimates are not moving symmetrically, we won't have the same synchronized pass through the minimum error, since at the time the E estimate is correct, A will still be high error. In this case, we are more likely to see error continue to fall as more updates occur. Below is a visualization of the state estimates at different stages in the training with the original initialization and a 0 initialization. In the 0 case, you can see the left-size estimates take a long time to reach the correct value, but in the original initialization, all the estimate approach the correct values roughly together.

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ·Æ“°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$e6672866-c0a0-46f2-bb52-25fcc3352645¹depends_on_disabled_cellsÂ§runtimeÎ'µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$223055df-7d5c-4d99-bc8d-fbc9702f906fŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÚà

Example 6.7: Maximization Bias Example

Consider an MDP with two non-terminal states A and B. Episodes always start in state A and there are two actions, left and right. Choosing right will always result in a reward of 0 and the episode terminating. Choosing left will transition into state B from which there are many actions, all of which result in a terminal transition with random rewards. The distribution of rewards for each of these actions is $\mathcal{N}(-0.1, 1)$. The estimated value of (A, right) will always be 0 since that is the only possible sample to be collected. The estimated value of (A, left) however will have higher variance but an expected value of -0.1. The problem with Q-learning is that, due to the maximization bias, (A, left) will have a higher value estimate when few samples have been collected since it is very likely that one of the state-action pairs from B will produce a reward greater than 0. The more of these actions exist, the worse the bias and the more samples needed to be collected to remove it. If we employ Double Q-learning instead, however, we can eliminate the bias completely.

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ½ö+°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$223055df-7d5c-4d99-bc8d-fbc9702f906f¹depends_on_disabled_cellsÂ§runtimeÎ’µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$35dc0d94-145a-4292-b0df-9e84a286c036Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚ}

6.8 Games, Afterstates, and Other Special Cases

In the tic-tac-toe example we considered learning a value function for a state after the player's move but before the opponent's response. This type of state is called an afterstate, and it is useful in situations when we know a portion of the dynamics in an environment, but then a portion of it is stochastic or unknown. For example, we typically know the immediate effect of our moves, but not necessarily what happens after that.

It can be more efficient to learn based on afterstates because there are fewer values to represent than if we need to learn the full action value function. Any state-action pair that maps to the same afterstate would be represented by a single value. These afterstate value functions can also be learned with generalized policy iteration.

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ¾õ°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$35dc0d94-145a-4292-b0df-9e84a286c036¹depends_on_disabled_cellsÂ§runtimeÎe€µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$4d7619ee-933f-452a-9202-e95a8f3da20fŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÚ]Sarsa backup diagram. Black circles represent actions and white circles represent states.

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•çÏ²d°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$4d7619ee-933f-452a-9202-e95a8f3da20f¹depends_on_disabled_cellsÂ§runtimeÎ} ‹µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$00d67a93-437c-4cda-899a-9daa1102e1f2Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚj„ ¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚš¡h°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$00d67a93-437c-4cda-899a-9daa1102e1f2¹depends_on_disabled_cellsÂ§runtimeÏIõáµpublished_object_keys‘Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/c9aac650c28e0825¸depends_on_skipped_cellsÂ§erroredÂÙ$500d8dd4-fc53-4021-b797-114224ca4debŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÚH

Actions

¤mime©text/html¬rootassignee¹const rook_action_display²last_run_timestampËAÚ•çýöä°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$500d8dd4-fc53-4021-b797-114224ca4deb¹depends_on_disabled_cellsÂ§runtimeÎfü¸µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$ff5d051e-5de1-48a9-9578-5dbafd71afd1Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚj„ ¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚšÃµ°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$ff5d051e-5de1-48a9-9578-5dbafd71afd1¹depends_on_disabled_cellsÂ§runtimeÎ Ueµpublished_object_keys‘Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/21f195b5663a5875¸depends_on_skipped_cellsÂ§erroredÂÙ$e947f86e-8dc3-4ce7-a9d4-0a7b675a9fa9Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ8begin_value_iteration_v (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚšOQ‡°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$e947f86e-8dc3-4ce7-a9d4-0a7b675a9fa9¹depends_on_disabled_cellsÂ§runtimeÎ$‹¢µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$a925534e-f9b8-471a-9d86-c9212129b630Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚ‹

The following represents a trajectory taken by a policy in an environment. We week to estimate $q_\pi(s, a)$ for the current behavior policy $\pi$ using the same TD method we introduced above. The update rule now, however, estimates the value of state action pairs rather than the states themselves.

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þº%°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$a925534e-f9b8-471a-9d86-c9212129b630¹depends_on_disabled_cellsÂ§runtimeÎVkµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$7a5ff8f7-70d4-46f1-a4a7-bbfcec4f6e3fŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ.sample_action (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•áü2'°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$7a5ff8f7-70d4-46f1-a4a7-bbfcec4f6e3f¹depends_on_disabled_cellsÂ§runtimeÎæµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$b5e06f59-33b5-414e-9a81-43e8abd07aa3Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÛ$¸

Q-learning Solution

Actions

Wind Values

380.0

0.0

Actions

Wind Values

Double Q-learning Solution

Actions

Wind Values

-1.2

-0.59

-0.43

-0.75

-0.26

0.014

-0.053

0.0

Actions

Wind Values

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚš(H¦°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$b5e06f59-33b5-414e-9a81-43e8abd07aa3¹depends_on_disabled_cellsÂ§runtimeÎ{Þ¾_µpublished_object_keys’Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/ff3e7516945b9e18Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/b5c0b7878012e9e3¸depends_on_skipped_cellsÂ§erroredÂÙ$a0d2333f-e87b-4981-bb52-d436ec6481c1Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚ`

Because TD(0) bases its update in part on an existing estimate, we say that it is a bootstrapping method, like DP. We know from Chapter 3 that

$$\begin{flalign} v_\pi & \doteq \mathbb{E}_\pi[G_t \mid S_t = s] \tag{6.3}\\ &= \mathbb{E}[R_{t+1} + \gamma G_{t+1} \mid S_t = s] \tag{from (3.9)}\\ &=\mathbb{E}[R_{t+1} + \gamma v_\pi (S_{t+1}) \mid S_t = s] \tag{6.4} \end{flalign}$$

Roughly speaking, Monte Carlo methods use an estimate of (6.3) as a target whereas DP methods use an estiamte of (6.4) as a target. The Monte Carlo target is an estimate because the exepcted value in (6.3) is not known; a sample return is used in place of the real expected return. The DP target is an estimate not because of the expected values, which are assumed to be completely provided by a model of the environment, but because $v_\pi(S_{t+1})$ is not known and the current estimate, $V(S_{t+1})$, is used isntead. The TD target is an estimate for both reasons; it samples the expected values in (6.4) and it uses the current estimate $V$ instead of the true $v_\pi$. Thus, TD methods combine the sampling of Monte Carlo with the bootstrapping of DP.

TD and Monte Carlo updates are both refered to as sample updates because they involve looking ahead to a sample successsor state (or state-action pair). Expected updates used in DP methods use the complete distribution of all possible successor states rather than a single sample.

Note that the quantity in the brakets in (6.2) is a sort of error, measuring the difference between the estimated value of $S_t$ and the better estimate $R_{t+1} + \gamma V(S_{t+1})$. This quantity is called the TD error:

$$\delta_t \doteq R_{t+1} + \gamma V(S_{t+1}) - V(S_t) \tag{6.5}$$

The TD error depends on the subsequent state so it is not available until one step later. That is to say $\delta_t$ is not known until time $t+1$. Also note that if we do not update $V$ during an episode (as we do not in Monte Carlo methods), then the Monte Carlo error can be written as the sum of TD errors:

$$\begin{flalign} G_t - V(S_t) &= R_{t+1} + \gamma G_{t+1} - V(S_t) + \gamma V(S_{t+1}) - \gamma V(S_{t+1}) \tag{from (3.9)} \\ &=\delta_t + \gamma(G_{t+1} - V(S_{t+1})) \tag{a}\\ &=\delta_t + \gamma \left ( \delta_{t+1} + \gamma(G_{t+2} - V(S_{t+2})) \right ) \tag{using (a)}\\ &=\delta_t + \gamma \delta_{t+1} + \gamma^2 \left ( G_{t+2} - V(S_{t+2}) \right ) \\ &=\delta_t + \gamma \delta_{t+1} + \gamma^2 \delta_{t+2} + \cdots + \gamma^{T-t-1}\delta_{T-1} + \gamma^{T-t}(G_T - V(S_T)) \tag{applying (a) until terination}\\ &=\delta_t + \gamma \delta_{t+1} + \gamma^2 \delta_{t+2} + \cdots + \gamma^{T-t-1}\delta_{T-1} + \gamma^{T-t}(0-0) \tag{definition of terminal state}\\ &=\sum_{k=t}^{T-1} \gamma^{k-t} \delta_k \tag{6.6} \end{flalign}$$

This identity is not exact if $V$ is updated during the episode (as it is in TD(0)), but if the step size is small then it may still hold approximately.

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ± á°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$a0d2333f-e87b-4981-bb52-d436ec6481c1¹depends_on_disabled_cellsÂ§runtimeÎ µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$f841c4d8-5176-4007-b472-9e01a799d85cŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ,addelements (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•åÓÕ•°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$f841c4d8-5176-4007-b472-9e01a799d85c¹depends_on_disabled_cellsÂ§runtimeÎØ µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$685a7ba3-0f94-4663-a68a-73fa03bd9445Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ5make_greedy_policy! (generic function with 2 methods)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚšHîÄ°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$685a7ba3-0f94-4663-a68a-73fa03bd9445¹depends_on_disabled_cellsÂ§runtimeÎD"…µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$d5abd922-a8c2-4f5c-9a6e-d2490a8ad7dcŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ)takestep (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•â-/°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$d5abd922-a8c2-4f5c-9a6e-d2490a8ad7dc¹depends_on_disabled_cellsÂ§runtimeÎß¹µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$bce6e4ab-58ec-4e00-be34-bc4caf51f57dŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ)cum_mean (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚšn:°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$bce6e4ab-58ec-4e00-be34-bc4caf51f57d¹depends_on_disabled_cellsÂ§runtimeÎ„øµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$4ddcd409-c31c-444c-8fcf-7cc45b68d93bŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ)make_mrp (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•åD•Ä°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$4ddcd409-c31c-444c-8fcf-7cc45b68d93b¹depends_on_disabled_cellsÂ§runtimeÎ3Ÿyµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$c5d32889-634b-4b00-8ba7-0d1ecaf94f05Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ>initialize_state_action_value (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•âÅe°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$c5d32889-634b-4b00-8ba7-0d1ecaf94f05¹depends_on_disabled_cellsÂ§runtimeÎ°úµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$3b16cbb7-f859-4871-9a63-8b40eb4191beŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÚ=

Exercise 6.1

If $V$ changes during the episode, then (6.6) only holds approximately; what would the difference be between the two sides? Let $V_t$ denote the array of state values used at time $t$ in the TD error (6.5) and in the TD update (6.2). Redo the derivation above to determine the additional amount that must be added to the sum of TD errors in order to equal the Monte Carlo error.

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ±&»°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$3b16cbb7-f859-4871-9a63-8b40eb4191be¹depends_on_disabled_cellsÂ§runtimeÎå<µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$902738c3-2f7b-49cb-8580-29359c857027Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚˆ ¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•åì ê°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$902738c3-2f7b-49cb-8580-29359c857027¹depends_on_disabled_cellsÂ§runtimeÎÑiØµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$c93ed1f2-3c38-4f68-8bf8-2cdf4e7bee34Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚè

Now we can rewrite the Monte Carlo error using (3.9) again and proceed with the derivation keeping track of the time index of the value estiamtes:

$$\begin{flalign} G_t - V_t(S_t) &= R_{t+1} + \gamma G_{t+1} - V_t(S_t) + \gamma V_{t}(S_{t+1}) - \gamma V_{t}(S_{t+1}) \tag{from (3.9)}\\ &= \delta_t + \gamma \left [ G_{t+1} - V_t(S_{t+1}) \right ] \\ &= \delta_t + \gamma \left [ G_{t+1} - V_{t+1}(S_{t+1}) + V_{t+1}(S_{t+1}) - V_t(S_{t+1}) \right ] \\ \end{flalign}$$

Define the following

$$\eta_{t} \doteq V_{t+1}(S_{t+1}) - V_t(S_{t+1})$$

which let's us re-write the equation

$$G_t - V_t(S_t) = \delta_t + \gamma \eta_{t} + \gamma \left [ G_{t+1} - V_{t+1}(S_{t+1})\right ]$$

Notice that the term in the brakets is equivalent to the left hand side but shifted forward one time step. That implies the equation can be expanded recursively as we did with the original derivation.

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ±j]°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$c93ed1f2-3c38-4f68-8bf8-2cdf4e7bee34¹depends_on_disabled_cellsÂ§runtimeÎ«Þµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$f36822d7-9ea8-4f5c-9925-dc2a466a68baŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ?

Dependencies and Settings

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ¿é°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$f36822d7-9ea8-4f5c-9925-dc2a466a68ba¹depends_on_disabled_cellsÂ§runtimeÎâÅµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$3e367811-247b-4bd6-b8fe-63f8996fb9e8Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ;

Formal Proof for Bias

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ¼ÿG°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$3e367811-247b-4bd6-b8fe-63f8996fb9e8¹depends_on_disabled_cellsÂ§runtimeÎåñµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$7de9b6a4-49ce-4dc3-9d5b-cecfcb98bba1Š¦queuedÂ¤logs§runningÂ¦output†¤body…¦prefixÙMFiniteAfterstateMDP{Float32, Tuple{Int64, Int64}, Tuple{Int64, Int64}, Int64}¨elementsš’¦states’…¦prefix³Tuple{Int64, Int64}¨elements›’’ƒ¨elements’’’¡0ªtext/plain’’¡0ªtext/plain¤type¥Tuple¨objectid°9b52efd7a2a08bd5Ù!application/vnd.pluto.tree+object’’ƒ¨elements’’’¡0ªtext/plain’’¡1ªtext/plain¤type¥Tuple¨objectid°86128cc9b5ae8f4aÙ!application/vnd.pluto.tree+object’’ƒ¨elements’’’¡0ªtext/plain’’¡2ªtext/plain¤type¥Tuple¨objectid°fc41ae7a664555b0Ù!application/vnd.pluto.tree+object’’ƒ¨elements’’’¡0ªtext/plain’’¡3ªtext/plain¤type¥Tuple¨objectid°5a8d0f981b76571aÙ!application/vnd.pluto.tree+object’’ƒ¨elements’’’¡0ªtext/plain’’¡4ªtext/plain¤type¥Tuple¨objectid°6ac4b5902680c6bbÙ!application/vnd.pluto.tree+object’’ƒ¨elements’’’¡0ªtext/plain’’¡5ªtext/plain¤type¥Tuple¨objectid°22d2c06707ebb5c4Ù!application/vnd.pluto.tree+object’’ƒ¨elements’’’¡0ªtext/plain’’¡6ªtext/plain¤type¥Tuple¨objectid°cd86b46be06a2ab4Ù!application/vnd.pluto.tree+object’’ƒ¨elements’’’¡0ªtext/plain’’¡7ªtext/plain¤type¥Tuple¨objectid°6f83360483e5fb68Ù!application/vnd.pluto.tree+object’ ’ƒ¨elements’’’¡0ªtext/plain’’¡8ªtext/plain¤type¥Tuple¨objectid°f2740b9bf789ce84Ù!application/vnd.pluto.tree+object¤more’Í¹’ƒ¨elements’’’¢20ªtext/plain’’¢20ªtext/plain¤type¥Tuple¨objectid°6e264f7db8959fbfÙ!application/vnd.pluto.tree+object¤type¥Array¬prefix_short ¨objectid°6be4cddb9c31579dÙ!application/vnd.pluto.tree+object’«afterstates’…¦prefix³Tuple{Int64, Int64}¨elements›’’ƒ¨elements’’’¡0ªtext/plain’’¡0ªtext/plain¤type¥Tuple¨objectid°9b52efd7a2a08bd5Ù!application/vnd.pluto.tree+object’’ƒ¨elements’’’¡0ªtext/plain’’¡1ªtext/plain¤type¥Tuple¨objectid°86128cc9b5ae8f4aÙ!application/vnd.pluto.tree+object’’ƒ¨elements’’’¡0ªtext/plain’’¡2ªtext/plain¤type¥Tuple¨objectid°fc41ae7a664555b0Ù!application/vnd.pluto.tree+object’’ƒ¨elements’’’¡0ªtext/plain’’¡3ªtext/plain¤type¥Tuple¨objectid°5a8d0f981b76571aÙ!application/vnd.pluto.tree+object’’ƒ¨elements’’’¡0ªtext/plain’’¡4ªtext/plain¤type¥Tuple¨objectid°6ac4b5902680c6bbÙ!application/vnd.pluto.tree+object’’ƒ¨elements’’’¡0ªtext/plain’’¡5ªtext/plain¤type¥Tuple¨objectid°22d2c06707ebb5c4Ù!application/vnd.pluto.tree+object’’ƒ¨elements’’’¡0ªtext/plain’’¡6ªtext/plain¤type¥Tuple¨objectid°cd86b46be06a2ab4Ù!application/vnd.pluto.tree+object’’ƒ¨elements’’’¡0ªtext/plain’’¡7ªtext/plain¤type¥Tuple¨objectid°6f83360483e5fb68Ù!application/vnd.pluto.tree+object’ ’ƒ¨elements’’’¡0ªtext/plain’’¡8ªtext/plain¤type¥Tuple¨objectid°f2740b9bf789ce84Ù!application/vnd.pluto.tree+object¤more’Í¹’ƒ¨elements’’’¢20ªtext/plain’’¢20ªtext/plain¤type¥Tuple¨objectid°6e264f7db8959fbfÙ!application/vnd.pluto.tree+object¤type¥Array¬prefix_short ¨objectid°b6c3515e31ee5179Ù!application/vnd.pluto.tree+object’§actions’…¦prefix¥Int64¨elements›’’¢-5ªtext/plain’’¢-4ªtext/plain’’¢-3ªtext/plain’’¢-2ªtext/plain’’¢-1ªtext/plain’’¡0ªtext/plain’’¡1ªtext/plain’’¡2ªtext/plain’ ’¡3ªtext/plain’ ’¡4ªtext/plain’’¡5ªtext/plain¤type¥Array¬prefix_short ¨objectid°d4363310ecd412c2Ù!application/vnd.pluto.tree+object’§rewards’…¦prefix§Float32¨elements›’’£0.0ªtext/plain’’¤10.0ªtext/plain’’¤20.0ªtext/plain’’¤30.0ªtext/plain’’¤40.0ªtext/plain’’¤50.0ªtext/plain’’¤60.0ªtext/plain’’¤70.0ªtext/plain’ ’¤80.0ªtext/plain¤more’'’¥380.0ªtext/plain¤type¥Array¬prefix_short ¨objectid°f6cc5eea1e1ab35fÙ!application/vnd.pluto.tree+object’£ptf’Ú~441Ã—39Ã—441 Array{Float32, 3}: [:, :, 1] = 0.00673795 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0134759 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0134759 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00898393 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00449196 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00179679 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000598929 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® â‹® â‹± â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 2] = 0.0 0.00661454 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00012341 0.0132291 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00024682 0.0132291 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00024682 0.00881938 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000164546 0.00440969 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 8.22732f-5 0.00176388 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3.29093f-5 0.000587959 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® â‹± â‹® 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 3] = 0.0 0.0 0.0061209 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000493639 0.0122418 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00012341 0.000987278 0.0122418 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00024682 0.000987278 0.0081612 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00024682 0.000658186 0.0040806 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000164546 0.000329093 0.00163224 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 8.22732f-5 0.000131637 0.00054408 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® â‹± â‹® 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ;;; â€¦ [:, :, 439] = 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® â‹± â‹® 0.0 0.0 0.0 â€¦ 3.70237f-30 0.0 0.0 0.0 0.0 0.0 0.0 0.000987278 4.62796f-31 0.0 0.0 0.0 0.0 0.0 0.000493639 0.00338174 5.44466f-32 0.0 0.0 0.0 0.0 0.00012341 0.00133908 0.00523368 6.04962f-33 0.0 0.0 0.0 0.0 0.00024682 0.00169087 0.00502024 6.36803f-34 0.0 0.0 0.0 0.0 0.000541653 0.00272339 0.00635617 â€¦ 6.50727f-29 0.0 0.0 0.0 0.0 [:, :, 440] = 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® â‹± â‹® 0.0 0.0 0.0 â€¦ 3.05282f-29 0.0 0.0 0.0 0.0 0.0 0.0 0.0 4.07042f-30 0.0 0.0 0.0 0.0 0.0 0.0 0.000987278 5.08803f-31 0.0 0.0 0.0 0.0 0.0 0.000493639 0.00338174 5.98591f-32 0.0 0.0 0.0 0.0 0.00012341 0.00133908 0.00523368 6.65102f-33 0.0 0.0 0.0 0.0 0.000788472 0.00441426 0.0113764 â€¦ 7.15415f-29 0.0 0.0 0.0 0.0 [:, :, 441] = 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â€¦ 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® â‹± â‹® 0.0 0.0 0.0 â€¦ 5.56469f-28 6.42698f-30 0.0 0.0 0.0 0.0 0.0 0.0 7.62311f-29 8.56931f-31 0.0 0.0 0.0 0.0 0.0 0.0 9.78329f-30 1.07116f-31 0.0 0.0 0.0 0.0 0.0 0.000987278 1.18091f-30 1.26019f-32 0.0 0.0 0.0 0.0 0.000493639 0.00338174 1.34537f-31 1.40021f-33 0.0 0.0 0.0 0.000911882 0.00575333 0.0166101 â€¦ 8.74815f-28 1.50614f-29 0.0 0.0 0.0ªtext/plain’®afterstate_map’Ú¢11Ã—441 Matrix{Int64}: 1 22 43 64 85 106 107 108 109 â€¦ 429 430 431 432 433 434 435 436 1 22 43 64 85 86 87 88 89 430 431 432 433 434 435 436 437 1 22 43 64 65 66 67 68 69 431 432 433 434 435 436 437 438 1 22 43 44 45 46 47 48 49 432 433 434 435 436 437 438 439 1 22 23 24 25 26 27 28 29 433 434 435 436 437 438 439 440 1 2 3 4 5 6 7 8 9 â€¦ 434 435 436 437 438 439 440 441 1 2 3 4 5 6 7 8 9 414 415 416 417 418 419 420 420 1 2 3 4 5 6 7 8 9 394 395 396 397 398 399 399 399 1 2 3 4 5 6 7 8 9 374 375 376 377 378 378 378 378 1 2 3 4 5 6 7 8 9 354 355 356 357 357 357 357 357 1 2 3 4 5 6 7 8 9 â€¦ 334 335 336 336 336 336 336 336ªtext/plain’²reward_interim_map’ÚŽ11Ã—441 Matrix{Float32}: -10.0 -10.0 -10.0 -10.0 -10.0 -10.0 â€¦ -10.0 -10.0 -10.0 -10.0 -10.0 -8.0 -8.0 -8.0 -8.0 -8.0 -8.0 -8.0 -8.0 -8.0 -8.0 -8.0 -6.0 -6.0 -6.0 -6.0 -6.0 -6.0 -6.0 -6.0 -6.0 -6.0 -6.0 -4.0 -4.0 -4.0 -4.0 -4.0 -4.0 -4.0 -4.0 -4.0 -4.0 -4.0 -2.0 -2.0 -2.0 -2.0 -2.0 -2.0 -2.0 -2.0 -2.0 -2.0 -2.0 -0.0 -0.0 -0.0 -0.0 -0.0 -0.0 â€¦ -0.0 -0.0 -0.0 -0.0 -0.0 -2.0 -2.0 -2.0 -2.0 -2.0 -2.0 -2.0 -2.0 -2.0 -2.0 -2.0 -4.0 -4.0 -4.0 -4.0 -4.0 -4.0 -4.0 -4.0 -4.0 -4.0 -4.0 -6.0 -6.0 -6.0 -6.0 -6.0 -6.0 -6.0 -6.0 -6.0 -6.0 -6.0 -8.0 -8.0 -8.0 -8.0 -8.0 -8.0 -8.0 -8.0 -8.0 -8.0 -8.0 -10.0 -10.0 -10.0 -10.0 -10.0 -10.0 â€¦ -10.0 -10.0 -10.0 -10.0 -10.0ªtext/plain’«state_index’…¦prefixÙ Dict{Tuple{Int64, Int64}, Int64}¨elements›’’ƒ¨elements’’’¢11ªtext/plain’’¢17ªtext/plain¤type¥Tuple¨objectid°49ec9371b177a25dÙ!application/vnd.pluto.tree+object’£249ªtext/plain’’ƒ¨elements’’’¢16ªtext/plain’’¢14ªtext/plain¤type¥Tuple¨objectid°d93d095a02371a59Ù!application/vnd.pluto.tree+object’£351ªtext/plain’’ƒ¨elements’’’¢18ªtext/plain’’¢16ªtext/plain¤type¥Tuple¨objectid°aeb6f295858259dbÙ!application/vnd.pluto.tree+object’£395ªtext/plain’’ƒ¨elements’’’¢17ªtext/plain’’¢12ªtext/plain¤type¥Tuple¨objectid¯68544eea78f6641Ù!application/vnd.pluto.tree+object’£370ªtext/plain’’ƒ¨elements’’’¡8ªtext/plain’’¢15ªtext/plain¤type¥Tuple¨objectid°ceff527f41a09840Ù!application/vnd.pluto.tree+object’£184ªtext/plain’’ƒ¨elements’’’¢16ªtext/plain’’¢16ªtext/plain¤type¥Tuple¨objectid°3164689f12bc7404Ù!application/vnd.pluto.tree+object’£353ªtext/plain’’ƒ¨elements’’’¢19ªtext/plain’’¢14ªtext/plain¤type¥Tuple¨objectid°cb90bf273945b2c8Ù!application/vnd.pluto.tree+object’£414ªtext/plain’’ƒ¨elements’’’¡7ªtext/plain’’¢18ªtext/plain¤type¥Tuple¨objectid°f3c6affef4f32144Ù!application/vnd.pluto.tree+object’£166ªtext/plain’’ƒ¨elements’’’¡7ªtext/plain’’¡8ªtext/plain¤type¥Tuple¨objectid°300559d2f34a9666Ù!application/vnd.pluto.tree+object’£156ªtext/plain’’ƒ¨elements’’’¢14ªtext/plain’’¢15ªtext/plain¤type¥Tuple¨objectid°ac753ed572b44c1dÙ!application/vnd.pluto.tree+object’£310ªtext/plain¤more¤type¤Dict¬prefix_short¤Dict¨objectid°fafc4b032688336dÙ!application/vnd.pluto.tree+object’°afterstate_index’…¦prefixÙ Dict{Tuple{Int64, Int64}, Int64}¨elements›’’ƒ¨elements’’’¢11ªtext/plain’’¢17ªtext/plain¤type¥Tuple¨objectid°49ec9371b177a25dÙ!application/vnd.pluto.tree+object’£249ªtext/plain’’ƒ¨elements’’’¢16ªtext/plain’’¢14ªtext/plain¤type¥Tuple¨objectid°d93d095a02371a59Ù!application/vnd.pluto.tree+object’£351ªtext/plain’’ƒ¨elements’’’¢18ªtext/plain’’¢16ªtext/plain¤type¥Tuple¨objectid°aeb6f295858259dbÙ!application/vnd.pluto.tree+object’£395ªtext/plain’’ƒ¨elements’’’¢17ªtext/plain’’¢12ªtext/plain¤type¥Tuple¨objectid¯68544eea78f6641Ù!application/vnd.pluto.tree+object’£370ªtext/plain’’ƒ¨elements’’’¡8ªtext/plain’’¢15ªtext/plain¤type¥Tuple¨objectid°ceff527f41a09840Ù!application/vnd.pluto.tree+object’£184ªtext/plain’’ƒ¨elements’’’¢16ªtext/plain’’¢16ªtext/plain¤type¥Tuple¨objectid°3164689f12bc7404Ù!application/vnd.pluto.tree+object’£353ªtext/plain’’ƒ¨elements’’’¢19ªtext/plain’’¢14ªtext/plain¤type¥Tuple¨objectid°cb90bf273945b2c8Ù!application/vnd.pluto.tree+object’£414ªtext/plain’’ƒ¨elements’’’¡7ªtext/plain’’¢18ªtext/plain¤type¥Tuple¨objectid°f3c6affef4f32144Ù!application/vnd.pluto.tree+object’£166ªtext/plain’’ƒ¨elements’’’¡7ªtext/plain’’¡8ªtext/plain¤type¥Tuple¨objectid°300559d2f34a9666Ù!application/vnd.pluto.tree+object’£156ªtext/plain’’ƒ¨elements’’’¢14ªtext/plain’’¢15ªtext/plain¤type¥Tuple¨objectid°ac753ed572b44c1dÙ!application/vnd.pluto.tree+object’£310ªtext/plain¤more¤type¤Dict¬prefix_short¤Dict¨objectid°a7eb00b000659f24Ù!application/vnd.pluto.tree+object’¬action_index’…¦prefix²Dict{Int64, Int64}¨elements›’’¡5ªtext/plain’¢11ªtext/plain’’¢-3ªtext/plain’¡3ªtext/plain’’¡1ªtext/plain’¡7ªtext/plain’’¡0ªtext/plain’¡6ªtext/plain’’¡4ªtext/plain’¢10ªtext/plain’’¢-5ªtext/plain’¡1ªtext/plain’’¢-1ªtext/plain’¡5ªtext/plain’’¡2ªtext/plain’¡8ªtext/plain’’¢-2ªtext/plain’¡4ªtext/plain’’¢-4ªtext/plain’¡2ªtext/plain¤more¤type¤Dict¬prefix_short¤Dict¨objectid°2c82fa6389959ad1Ù!application/vnd.pluto.tree+object¤type¦struct¬prefix_short³FiniteAfterstateMDP¨objectid°e4c5edf99f3d49c7¤mimeÙ!application/vnd.pluto.tree+object¬rootassignee¾const jacks_car_afterstate_mdp²last_run_timestampËAÚšÿy°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$7de9b6a4-49ce-4dc3-9d5b-cecfcb98bba1¹depends_on_disabled_cellsÂ§runtimeÎ°î¼µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$c4719c42-87aa-482a-95aa-a1492d42835dŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ:

Stochastic Gridworld

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ»‡Q°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$c4719c42-87aa-482a-95aa-a1492d42835d¹depends_on_disabled_cellsÂ§runtimeÎPµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$495f5606-0567-47ad-a266-d21320eecfc6Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚx

Monte Carlo nonstationary update rule for value function

$$V(S_t) \leftarrow V(S_t) + \alpha [G_t - V(S_t)] \tag{6.1}$$

where $G_t$ is the actual return following time $t$, and $\alpha$ is a constant step-size parameter. Call this method constant-Î± MC. The use of a constant step size Î± instead of the usual sample average is what makes this estiamtion method suitable for non-stationary problems. Because the value $G_t$ is required, this method requires waiting for the final results from the end of an episode.

In contrast, TD methods need only wait for results from the following timestep to perform an update. The following is the simplest TD method update rule:

$$V(S_t) \leftarrow V(S_t) + \alpha[R_{t+1} + \gamma V(S_{t+1}) - V(S_t)] \tag{6.2}$$

where the update can be made immediately on transition to $S_{t+1}$ after receiving $R_{t+1}$. This TD method is called $TD(0)$, or one-step TD. See below for code implementing this.

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ°»s°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$495f5606-0567-47ad-a266-d21320eecfc6¹depends_on_disabled_cellsÂ§runtimeÎcèµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$0a4ed8c7-27ca-45cb-af15-70ddd86240fbŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÙL

Batch Method Estimation Implementation

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ¹ M°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$0a4ed8c7-27ca-45cb-af15-70ddd86240fb¹depends_on_disabled_cellsÂ§runtimeÎæsµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$cdedd35e-52b8-40a5-938d-2d36f6f93217Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚ°

Actions

¤mime©text/html¬rootassignee¹const king_action_display²last_run_timestampËAÚ•èê°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$cdedd35e-52b8-40a5-938d-2d36f6f93217¹depends_on_disabled_cellsÂ§runtimeÎ¥êµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$3756a3f8-18e8-4d62-afa1-cfeb4183820cŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ6double_expected_sarsa (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚšËc¿°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$3756a3f8-18e8-4d62-afa1-cfeb4183820c¹depends_on_disabled_cellsÂ§runtimeÎ]µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$04a0be81-ee5f-4eeb-963a-ad930392d50bŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÛZÁ

Sarsa Solution

Actions

Wind Values

-15.0

-16.0

-17.0

-16.0

-14.0

-15.0

-16.0

-14.0

-15.0

-13.0

-14.0

-13.0

-14.0

-13.0

0.0

-12.0

-13.0

-12.0

0.0

-11.0

-12.0

0.0

-9.9

-11.0

-12.0

-11.0

0.0

-2.1

-1.0

0.0

-5.9

-8.7

-9.1

-2.0

-2.3

-1.0

-5.6

-6.9

-8.2

-8.0

-3.0

-4.0

-2.0

-3.0

-4.6

-5.9

-6.7

Actions

Wind Values

Value Iteration Solution

Actions

Wind Values

-15.0

-14.0

-13.0

-12.0

-11.0

-2.0

-10.0

-2.0

-1.0

-9.0

-1.0

-2.0

-1.0

0.0

-8.0

-2.0

-1.0

-5.0

-6.0

-7.0

-3.0

-2.0

-3.0

-4.0

-5.0

-6.0

Actions

Wind Values

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚšþ‰U°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$04a0be81-ee5f-4eeb-963a-ad930392d50b¹depends_on_disabled_cellsÂ§runtimeÎ¢î6µpublished_object_keys”Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/c925358b3c3d408eÙ49c6be96e-38f7-11f0-2d30-a71f02755abc/13d8f542ac69f87Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/90f5c347caa747c8Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/d2eeaee44f48b8a0¸depends_on_skipped_cellsÂ§erroredÂÙ$136d1d96-b590-4f03-9e42-2337efc560ccŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÚ’ ¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•çþvŠ°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$136d1d96-b590-4f03-9e42-2337efc560cc¹depends_on_disabled_cellsÂ§runtimeÍ*¼µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$6bffb08c-704a-4b7c-bfce-b3d099cf35c0Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ;gridworld_Q_vs_sarsa_solve (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚšeBž°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$6bffb08c-704a-4b7c-bfce-b3d099cf35c0¹depends_on_disabled_cellsÂ§runtimeÎ¦fáµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$f95ceb98-f12e-4650-9ad3-0609b7ecd0f3Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚ

Exercise 6.14

Describe how the task of Jack's Car Rental (Example 4.2) could be reformulated in terms of afterstates. Why, in terms of this specific task, would such a reformulation be likely to speed convergence?

In the original problem the state is the number of cars at each location at the end of the day. The actions are the net numbers of cars moved between the two locations overnight. With an afterstate approach, the value function would only consider the number of cars after the movement is performed. This would be equivalent to valuing the state the following morning when customers begin to return and rent new cars.

The random processes that occur the following day will have a good/bad outcome based on the cars available at each location at the start of the day. This approach would likely converge faster because we are only modeling the value of the state that is directly related to whether or not cars will be available. Similar to the tic-tac-toe example, many actions will result in the same afterstate, but equivalent afterstates should have the same value. See below for code that creates the car rental MDP and solves it using value iteration with afterstates.

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ¿´N°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$f95ceb98-f12e-4650-9ad3-0609b7ecd0f3¹depends_on_disabled_cellsÂ§runtimeÎ£¼µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$8787a5fd-d0ab-46b5-a7df-e7bc103a7378Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ3value_iteration_v! (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚšžÀ]°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$8787a5fd-d0ab-46b5-a7df-e7bc103a7378¹depends_on_disabled_cellsÂ§runtimeÎ‰µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$03a06e10-f68a-403c-97bf-7a7627f2c5d6Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚ‘

Hasselt, in his paper proposes an alternative Double Estimator to correct this bias in approximating $\max_i \mathbb{E} \{ X_i \}$ which uses two sets of estimators: $\mu^A = \{ \mu_1^A, \dots, \mu_M^A \}$ and $\mu^B = \{ \mu_1^B, \dots, \mu_M^B \}$.

Both sets of estimators are updated with a subset of samples we draw, such that $S = S^A \cup S^B$ and $S^A \cap S^B = \emptyset$ and $\mu_i^A(S) = \frac{1}{\vert S_i^A \vert } \sum_{s \in S_i^A} s$ and $\mu_i^B(S) = \frac{1}{\vert S_i^B \vert } \sum_{s \in S_i^B} s$. Like the single estimator $\mu_i$, both $\mu_i^A$ and $\mu_i^B$ are unbiased if we assume that samples are split in a proper manner, for instance randomly over the two sets of estimators. Let $Max^A (S) \doteq \{ j \mid \mu_j^A (S) = \max_i \mu_i^A (S) \}$ be the set of maximal estimates in $\mu^A(S)$. Since $\mu^B$ is an independent, unbiased set of estimators, we have $\mathbb{E} \{ \mu_j^B \} = \mathbb{E} \{ X_j \}$ for all $j$, including all $j \in Max^A$. Let $a^*$ be an estimator that maximizes $\mu^A:\mu_{a^*}^A(S) \doteq \max_i \mu_i ^A (S)$. If there are multiple estimators that maximize $\mu^A$, we can for instance pick one at random. Then we can use $\mu_{a^*}^B$ as an estimate for $\max_i \mathbb{E} \{ \mu_i^B \}$ and therefore also for $\max_i \mathbb{E} \{ X_i \}$ and we obtain the approximation

$$\max_i \mathbb{E} \{ X_i \} = \max_i \mathbb{E} \{ \mu_i^B \} \approx \mu_{a^*}^B \tag{e}$$

As we gain more samples the variance of the estimators decreases. In the limit, $\mu_i^A(S) = \mu_i^B(S) = \mathbb{E} \{ X_i \}$ for all $i$ and the approximation in $(e)$ converges to the correct result.

Assume that hte underlying PDFs are continuous. The probability $P(j = a^*)$ for any $j$ is then equal to the probability that all $i \neq j$ give lower estimates. Thus $\mu_j^A(S) = x$ is maximal for some value $x$ with probability $\prod_{i \neq j}^M P(\mu_i ^A \lt x)$. Integrating out $x$ gives $P(j = a^*) = \int_{-\infty}^\infty P(\mu_j^A = x) \prod_{i \neq j}^M P(\mu_i^A < x)dx \doteq \int_{-\infty}^\infty f_j^A(x) \prod_{i \neq j}^M F_i^A(x) dx$, where $f_i^A$ and $F_i^A$ are the PDF and CDF of $\mu_i^A$. The expected value of the approximation by the double estimator can thus be givne by

$$\sum_j^M P(j = a^*) \mathbb{E} \{ \mu_j^B \} = \sum_j^M \mathbb{E} \{ \mu_j ^B \} \int_{-\infty}^\infty f_j^A(x) \prod_{i \neq j} F_i^A(x)dx \tag{f}$$

For discrete PDFs the probability that two or more estimators are equal should be taken into account and the integrals should be replaced with sums.

Comparing (f) to (c), we see the difference is that the double estimator uses $\mathbb{E} \{ \mu_j^B \}$ in place of $x$. The single estimator overestimates, because $x$ is within the integral and therefore correlates with the monotonically increasing product $\prod_{i \neq j} F_i^\mu(x)$. The double estimator underestimates because the probabilities $P(j = a^*)$ sum to one and therefore the approximation is a weighted estimate of unbiased expected values, which must be lower or equal to the maximum expected value. In the following lemma, which holds in both discrete and the continuous case, we prove in general that hte estimate $\mathbb{E} \{ \mu_{a^*}^B \}$ is not an unbiased estimate of $\max_i \mathbb{E} \{ X_i \}$.

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ½“û°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$03a06e10-f68a-403c-97bf-7a7627f2c5d6¹depends_on_disabled_cellsÂ§runtimeÎU-µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$0d6a11af-b146-4bbc-997e-a11b897269a7Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙE

6.4 Sarsa: On-policy TD Control

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ¹èû°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$0d6a11af-b146-4bbc-997e-a11b897269a7¹depends_on_disabled_cellsÂ§runtimeÎ$Sµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$72b4d8d5-464c-4561-8c69-28ef3f59630bŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ/update_value! (generic function with 2 methods)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•ç ’°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$72b4d8d5-464c-4561-8c69-28ef3f59630b¹depends_on_disabled_cellsÂ§runtimeÎ%îtµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$47c2cbdd-f6db-4ce5-bae2-8141f30aacbcŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÚ/

Example 6.2 Random Walk

In this example we empirically compare the prediction abilities of TD(0) and constant-Î± MC when applied to the following Markov reward process:

In this MRP the agent's actions are irrelevant as each step the state transition occurs either to the left or the right with equal probability. An episode ends when the transition terminates at the left or right side of the chain. If the agent exits to the right, it receives a reward of 1. Otherwise, all other transitions receive a reward of 0. Below is an animation of the agent randomly moving through an episode. Longer chains will have longer episode times on average growing roughly quadratically with the length of the chain. Underneath the visualizations is the code.

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ·+w°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$47c2cbdd-f6db-4ce5-bae2-8141f30aacbc¹depends_on_disabled_cellsÂ§runtimeÎ&êµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$8224b808-5778-458b-b683-ea2603c82117Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ@

Example 6.6: Cliff Walking

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ»©‚°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$8224b808-5778-458b-b683-ea2603c82117¹depends_on_disabled_cellsÂ§runtimeÎ AGµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$c4919d14-8cba-43e6-9369-efc52bcb9b23Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ5make_greedy_policy! (generic function with 3 methods)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚšWû°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$c4919d14-8cba-43e6-9369-efc52bcb9b23¹depends_on_disabled_cellsÂ§runtimeÎEV>µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$05664aaf-575b-4249-974c-d8a2e63f380aŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÚP

Exercise 6.11

Why is Q-learning considered an off-policy control method?

If we compare to the on-policy update rule, the expected value being calculated at each state action pair should be:

$$Q_\pi(S_t, A_t) = \text{E}_\pi [R_{t+1} + \gamma Q_\pi(S_{t+1}, A_{t+1})]$$

which we estimate with sampling. In Q-learning, the expected value being estimated is instead:

$$Q_\pi(S_t, A_t) = \text{E}_\pi [R_{t+1} + \gamma \text{max}_a Q_\pi(S_{t+1}, a)]$$

Since the behavior policy being used to select the subsequent action taken from state $S_{t+1}$ is $\epsilon$-greedy, there is a probability that the next action will not match the maximizing action. So the Q-Learning update is computing the optimal greedy state-action value function rather than the optimal $\epsilon$-greedy value function of the behavior policy. Sarsa, in contrast follows the same policy and computes the value function which matches this policy, thus making it a true on-policy method.

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ»Ë:°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$05664aaf-575b-4249-974c-d8a2e63f380a¹depends_on_disabled_cellsÂ§runtimeÎlEµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$dda222ef-8178-40bb-bf20-d242924c4fabŠ¦queuedÂ¤logs§runningÂ¦output†¤body…¦prefixÙÜMDP_TD{GridworldState, GridworldAction, var"#tr#115"{var"#110#119", var"#step#114"{typeof(apply_wind), Vector{Int64}, var"#boundstate#113"{Int64, Int64}}}, var"#108#117"{GridworldState}, var"#isterm#116"{GridworldState}}¨elements—’¦states’…¦prefixÙ$Main.var"workspace#3".GridworldState¨elements›’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°78e123e4d06443c5Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡2ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°e3e6b18864c38362Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡3ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°7d75a915b81b9730Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡4ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°32586272439d3588Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡5ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°593769200b7ddf14Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡6ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°d7705072ebc67732Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡7ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid¯32fa797472e0a83Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡2ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°ef30e57ae60bdc38Ù!application/vnd.pluto.tree+object’ ’…¦prefix®GridworldState¨elements’’¡x’¡2ªtext/plain’¡y’¡2ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°74f49756a2864a57Ù!application/vnd.pluto.tree+object¤more’F’…¦prefix®GridworldState¨elements’’¡x’¢10ªtext/plain’¡y’¡7ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°91d5970141de4b2dÙ!application/vnd.pluto.tree+object¤type¥Array¬prefix_short ¨objectid°e8e8cb666e91b5c0Ù!application/vnd.pluto.tree+object’«statelookup’…¦prefixÙ1Dict{Main.var"workspace#3".GridworldState, Int64}¨elements›’’…¦prefix®GridworldState¨elements’’¡x’¡8ªtext/plain’¡y’¡5ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°14e5eae9a48c6749Ù!application/vnd.pluto.tree+object’¢54ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡6ªtext/plain’¡y’¡4ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°4e5052ac2b36c8beÙ!application/vnd.pluto.tree+object’¢39ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡7ªtext/plain’¡y’¡4ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°6d65389daed97014Ù!application/vnd.pluto.tree+object’¢46ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡8ªtext/plain’¡y’¡4ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°b85af438304886c5Ù!application/vnd.pluto.tree+object’¢53ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¢10ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°dad6dff35c9621ffÙ!application/vnd.pluto.tree+object’¢64ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡6ªtext/plain’¡y’¡7ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°4e4b90239eb3be65Ù!application/vnd.pluto.tree+object’¢42ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡8ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°6d43cd1ca99a553eÙ!application/vnd.pluto.tree+object’¢50ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡2ªtext/plain’¡y’¡3ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°166e372c47e8ffa6Ù!application/vnd.pluto.tree+object’¢10ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡5ªtext/plain’¡y’¡3ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°f8402269233868c7Ù!application/vnd.pluto.tree+object’¢31ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡8ªtext/plain’¡y’¡7ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°b08053c76dcd8072Ù!application/vnd.pluto.tree+object’¢56ªtext/plain¤more¤type¤Dict¬prefix_short¤Dict¨objectid°367cac091827c280Ù!application/vnd.pluto.tree+object’§actions’…¦prefixÙ%Main.var"workspace#3".GridworldAction¨elements˜’’…¦prefix¢Up¨elements¤type¦struct¬prefix_short¢Up¨objectid°ffffffff92511601Ù!application/vnd.pluto.tree+object’’…¦prefix¤Down¨elements¤type¦struct¬prefix_short¤Down¨objectid°ffffffff0a19a748Ù!application/vnd.pluto.tree+object’’…¦prefix¤Left¨elements¤type¦struct¬prefix_short¤Left¨objectid°ffffffff13951cc2Ù!application/vnd.pluto.tree+object’’…¦prefix¥Right¨elements¤type¦struct¬prefix_short¥Right¨objectid°ffffffffa8d2f4c6Ù!application/vnd.pluto.tree+object’’…¦prefix§UpRight¨elements¤type¦struct¬prefix_short§UpRight¨objectid°ffffffff65dca132Ù!application/vnd.pluto.tree+object’’…¦prefix¦UpLeft¨elements¤type¦struct¬prefix_short¦UpLeft¨objectid°ffffffff68f3503eÙ!application/vnd.pluto.tree+object’’…¦prefix©DownRight¨elements¤type¦struct¬prefix_short©DownRight¨objectid°ffffffff97f641f9Ù!application/vnd.pluto.tree+object’’…¦prefix¨DownLeft¨elements¤type¦struct¬prefix_short¨DownLeft¨objectid°ffffffffd243dd41Ù!application/vnd.pluto.tree+object¤type¥Array¬prefix_short ¨objectid°d84fdc99910d1e41Ù!application/vnd.pluto.tree+object’¬actionlookup’…¦prefixÙ2Dict{Main.var"workspace#3".GridworldAction, Int64}¨elements˜’’…¦prefix¤Down¨elements¤type¦struct¬prefix_short¤Down¨objectid°ffffffff0a19a748Ù!application/vnd.pluto.tree+object’¡2ªtext/plain’’…¦prefix¤Left¨elements¤type¦struct¬prefix_short¤Left¨objectid°ffffffff13951cc2Ù!application/vnd.pluto.tree+object’¡3ªtext/plain’’…¦prefix§UpRight¨elements¤type¦struct¬prefix_short§UpRight¨objectid°ffffffff65dca132Ù!application/vnd.pluto.tree+object’¡5ªtext/plain’’…¦prefix©DownRight¨elements¤type¦struct¬prefix_short©DownRight¨objectid°ffffffff97f641f9Ù!application/vnd.pluto.tree+object’¡7ªtext/plain’’…¦prefix¨DownLeft¨elements¤type¦struct¬prefix_short¨DownLeft¨objectid°ffffffffd243dd41Ù!application/vnd.pluto.tree+object’¡8ªtext/plain’’…¦prefix¥Right¨elements¤type¦struct¬prefix_short¥Right¨objectid°ffffffffa8d2f4c6Ù!application/vnd.pluto.tree+object’¡4ªtext/plain’’…¦prefix¦UpLeft¨elements¤type¦struct¬prefix_short¦UpLeft¨objectid°ffffffff68f3503eÙ!application/vnd.pluto.tree+object’¡6ªtext/plain’’…¦prefix¢Up¨elements¤type¦struct¬prefix_short¢Up¨objectid°ffffffff92511601Ù!application/vnd.pluto.tree+object’¡1ªtext/plain¤type¤Dict¬prefix_short¤Dict¨objectid°4cc9d28144c214f4Ù!application/vnd.pluto.tree+object’ªstate_init’Ù%#108 (generic function with 1 method)ªtext/plain’¤step’Ú(::Main.var"workspace#3".var"#tr#115"{Main.var"workspace#3".var"#110#119", Main.var"workspace#3".var"#step#114"{typeof(Main.var"workspace#3".apply_wind), Vector{Int64}, Main.var"workspace#3".var"#boundstate#113"{Int64, Int64}}}) (generic function with 1 method)ªtext/plain’¦isterm’Ùq(::Main.var"workspace#3".var"#isterm#116"{Main.var"workspace#3".GridworldState}) (generic function with 1 method)ªtext/plain¤type¦struct¬prefix_short¦MDP_TD¨objectid°f3ad8e4ba985532c¤mimeÙ!application/vnd.pluto.tree+object¬rootassignee´const king_gridworld²last_run_timestampËAÚ•èÁÚì°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$dda222ef-8178-40bb-bf20-d242924c4fab¹depends_on_disabled_cellsÂ§runtimeÎ§£žµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$48b557e3-e239-45e9-ab15-105bcca96492Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚ p

6.3 Optimality of TD(0)

Suppose there is available only a finite amount of experience, say 10 episodes or 100 time steps. In this case, a common approach with incremental learning methods is to present the experience repeatedly until the method converges upon an answer. Given an approximate value function $V$, the increments specified by (6.1) or (6.2) are computed for every time step $t$ at which a nonterminal state is visited, but the value function is changed only once, by the sum of all the increments. Then all the available experience is processed again with the new value function to produce a new overall increment, and so on, until the value function converged. We call this batch updating because updates are made only after processing each complete batch of training data.

Under batch updating, TD(0) converges deterministically to a single answer independent of the step-size parameter, $\alpha$, as long as $\alpha$ is chosen to be sufficiently small. The constant $\alpha$ MC method also converges deterministically under the same conditions, but to a difference answer. Understanding these two answers will help us understand the difference between the two methods. Under normal updating the methods do not move all the way to their respective batch answers, but in some sense they take steps in these directions. Before trying to understand the two answers in general, for all possible tasks, we first look at a few examples.

Example 6.3: Random walk under batch updating

Batch-updating versions of TD(0) and constant-$\alpha$ MC were applied as follows to the random walk prediction example (Example 6.2). After each new episode, all episodes seen so far were treated as a batch. They were repeatedly presented to the algorithm, either TD(0) or constant-$\alpha$ MC, with $\alpha$ sufficiently small that the value function converged. The resulting value function was then compared with $v_\pi$, and the average root mean square error across the five states (and accross 100 independent repetitions of the whole experiment) was plotted to obtain the learning curves shown in Figure 6.2. Note that the batch TD method was consistently better than the batch Monte Caro method.

Under batch training, constant-$\alpha$ MC converges to the values, $V(s)$, that are sample averages of the actual returns experienced after visiting each state $s$. These are optimal estimates in the sense that they minimize the mean square error from the actual returns in the training set. In this sense it is surprising that the batch TD method was able to perform better according to the root mean square error measure shown in figure 6.2. How is it that batch TD was able to perform better than this optimal method? The answer is that the Monte Carlo method is optimal only in a limited way, and that TD is optimal in a way that is more relevant to predicting returns.

Below is code implementing both batch methods in general for arbitrary MDPs.

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ¹…°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$48b557e3-e239-45e9-ab15-105bcca96492¹depends_on_disabled_cellsÂ§runtimeÎ ô!µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$846720cc-550a-4a3c-a80e-40b99671f4e2Š¦queuedÂ¤logs§runningÂ¦output†¤body…¦prefix¥Int64¨elements’’’¢-1ªtext/plain’’¡1ªtext/plain¤type¥Array¬prefix_short ¨objectid¯f06672b348e7aaa¤mimeÙ!application/vnd.pluto.tree+object¬rootassignee¯const mrp_moves²last_run_timestampËAÚ•å=ñ•°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$846720cc-550a-4a3c-a80e-40b99671f4e2¹depends_on_disabled_cellsÂ§runtimeÎ–ºµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$6556dafb-04fa-434c-868a-8d7bb7b5b196Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ0make_cliffworld (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•éÉ3{°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$6556dafb-04fa-434c-868a-8d7bb7b5b196¹depends_on_disabled_cellsÂ§runtimeÎy†µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$3f4f078a-9fc4-4b02-b499-a805fd5f1071Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ

Actions

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•è±« °persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$75bfe913-8757-4789-b708-7d400c225218¹depends_on_disabled_cellsÂ§runtimeÎE~µpublished_object_keys‘Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/536a0a4e512619da¸depends_on_skipped_cellsÂ§erroredÂÙ$fe2ebf39-4ab3-4aa8-abbd-23389eaf400eŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÚF

Sarsa converges with probability 1 to an optimal policy and action-value function, under the usual conditions on step sizes (2.7), as long as all state-action pairs are visited an infinite number of times and the policy converges in the limit to the greedy policy (which can be arranged, for example, with $\epsilon$-greedy policies by setting $\epsilon = 1/t$). Below is code that implements Sarsa using the $\epsilon$-greedy method for exploration.

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þº@!°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$fe2ebf39-4ab3-4aa8-abbd-23389eaf400e¹depends_on_disabled_cellsÂ§runtimeÎ‡Äµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$98bec66e-d8f3-4d4d-b4ec-5838489164e5Š¦queuedÂ¤logs§runningÂ¦output†¤body…¦prefixÙ×MDP_TD{GridworldState, GridworldAction, var"#tr#115"{var"#221#223", var"#step#114"{var"#220#222", Vector{Int64}, var"#boundstate#113"{Int64, Int64}}}, var"#108#117"{GridworldState}, var"#isterm#116"{GridworldState}}¨elements—’¦states’…¦prefixÙ$Main.var"workspace#3".GridworldState¨elements™’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°78e123e4d06443c5Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡2ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°e3e6b18864c38362Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡3ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°7d75a915b81b9730Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡2ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°ef30e57ae60bdc38Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡2ªtext/plain’¡y’¡2ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°74f49756a2864a57Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡2ªtext/plain’¡y’¡3ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°166e372c47e8ffa6Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡3ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°ec7c7c34244569a4Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡3ªtext/plain’¡y’¡2ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°c1258421535f88fcÙ!application/vnd.pluto.tree+object’ ’…¦prefix®GridworldState¨elements’’¡x’¡3ªtext/plain’¡y’¡3ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°3ed622ab169cc67cÙ!application/vnd.pluto.tree+object¤type¥Array¬prefix_short ¨objectid°f181cfeac924fd67Ù!application/vnd.pluto.tree+object’«statelookup’…¦prefixÙ1Dict{Main.var"workspace#3".GridworldState, Int64}¨elements™’’…¦prefix®GridworldState¨elements’’¡x’¡2ªtext/plain’¡y’¡2ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°74f49756a2864a57Ù!application/vnd.pluto.tree+object’¡5ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡3ªtext/plain’¡y’¡2ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°c1258421535f88fcÙ!application/vnd.pluto.tree+object’¡8ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡2ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°ef30e57ae60bdc38Ù!application/vnd.pluto.tree+object’¡4ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡3ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°7d75a915b81b9730Ù!application/vnd.pluto.tree+object’¡3ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡3ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°ec7c7c34244569a4Ù!application/vnd.pluto.tree+object’¡7ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°78e123e4d06443c5Ù!application/vnd.pluto.tree+object’¡1ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡3ªtext/plain’¡y’¡3ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°3ed622ab169cc67cÙ!application/vnd.pluto.tree+object’¡9ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡2ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°e3e6b18864c38362Ù!application/vnd.pluto.tree+object’¡2ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡2ªtext/plain’¡y’¡3ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°166e372c47e8ffa6Ù!application/vnd.pluto.tree+object’¡6ªtext/plain¤type¤Dict¬prefix_short¤Dict¨objectid°d73841a2172a4792Ù!application/vnd.pluto.tree+object’§actions’…¦prefixÙ%Main.var"workspace#3".GridworldAction¨elements”’’…¦prefix¢Up¨elements¤type¦struct¬prefix_short¢Up¨objectid°ffffffff92511601Ù!application/vnd.pluto.tree+object’’…¦prefix¤Down¨elements¤type¦struct¬prefix_short¤Down¨objectid°ffffffff0a19a748Ù!application/vnd.pluto.tree+object’’…¦prefix¤Left¨elements¤type¦struct¬prefix_short¤Left¨objectid°ffffffff13951cc2Ù!application/vnd.pluto.tree+object’’…¦prefix¥Right¨elements¤type¦struct¬prefix_short¥Right¨objectid°ffffffffa8d2f4c6Ù!application/vnd.pluto.tree+object¤type¥Array¬prefix_short ¨objectid°952f6adeb23ade52Ù!application/vnd.pluto.tree+object’¬actionlookup’…¦prefixÙ2Dict{Main.var"workspace#3".GridworldAction, Int64}¨elements”’’…¦prefix¤Down¨elements¤type¦struct¬prefix_short¤Down¨objectid°ffffffff0a19a748Ù!application/vnd.pluto.tree+object’¡2ªtext/plain’’…¦prefix¤Left¨elements¤type¦struct¬prefix_short¤Left¨objectid°ffffffff13951cc2Ù!application/vnd.pluto.tree+object’¡3ªtext/plain’’…¦prefix¥Right¨elements¤type¦struct¬prefix_short¥Right¨objectid°ffffffffa8d2f4c6Ù!application/vnd.pluto.tree+object’¡4ªtext/plain’’…¦prefix¢Up¨elements¤type¦struct¬prefix_short¢Up¨objectid°ffffffff92511601Ù!application/vnd.pluto.tree+object’¡1ªtext/plain¤type¤Dict¬prefix_short¤Dict¨objectid°bfde935e6876dc7bÙ!application/vnd.pluto.tree+object’ªstate_init’Ù%#108 (generic function with 1 method)ªtext/plain’¤step’Ú(::Main.var"workspace#3".var"#tr#115"{Main.var"workspace#3".var"#221#223", Main.var"workspace#3".var"#step#114"{Main.var"workspace#3".var"#220#222", Vector{Int64}, Main.var"workspace#3".var"#boundstate#113"{Int64, Int64}}}) (generic function with 1 method)ªtext/plain’¦isterm’Ùq(::Main.var"workspace#3".var"#isterm#116"{Main.var"workspace#3".GridworldState}) (generic function with 1 method)ªtext/plain¤type¦struct¬prefix_short¦MDP_TD¨objectid°4b7046f5bb96df87¤mimeÙ!application/vnd.pluto.tree+object¬rootassigneeµconst noisy_gridworld²last_run_timestampËAÚš/a1°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$98bec66e-d8f3-4d4d-b4ec-5838489164e5¹depends_on_disabled_cellsÂ§runtimeÎæ_µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$b59eacf8-7f78-4015-bf2c-66f89bf0e24eŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÚ

Exercise 6.10: Stochastic Wind (programming)

Re-solve the windy gridworld task with King's moves, assuming the effect of the wind, if there is any, is stochastic, sometimes varying by 1 from the mean values given for each column. That is, a third of the time you move exactly according to these values, as in the previous exercise, but also a third of the time you move one cell above that, and another third of the time you move one cell below that. For example, if you are one cell to the right of the goal and you move left, then one-third of the time you move one cell above the goal, one-third of the time you move two cells above the goal, and one-third of the time you move to the goal.

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þºþ?°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$b59eacf8-7f78-4015-bf2c-66f89bf0e24e¹depends_on_disabled_cellsÂ§runtimeÎÛµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$1ae30f5d-b25b-4dcb-800f-45c463641ec5Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚ ?

Exercise 6.8

Show that an action-value version of (6.6) holds for the action-value form of the TD error $\delta_t=R_{t+1}+\gamma Q(S_{t+1}, A_{t+1}) - Q(S_t, A_t)$, again assuming that the values don't change from step to step.

The derivation in (6.6) starts with the definition in (3.9):

$$G_t = R_{t+1} + \gamma G_{t+1}$$

and derives the following:

$$\delta_t \doteq R_{t+1} + \gamma V(S_{t+1}) - V(S_t)$$

$$G_t - V(S_t) = \sum_{k=t}^{T-1} \gamma^{k-t} \delta_k$$

Now we have the action-value form of the TD error:

$$\delta_t \doteq R_{t+1}+\gamma Q(S_{t+1}, A_{t+1}) - Q(S_t, A_t)$$

Let us transform (3.9) in a similar manner to derive the rule:

$$\begin{flalign} G_t - Q(S_t, A_t) &= R_{t+1} + \gamma G_{t+1} - Q(S_t, A_t) + \gamma Q(S_{t+1}, A_{t+1}) - \gamma Q(S_{t+1}, A_{t+1}) \\ &= \delta_t + \gamma (G_{t+1} - Q(S_{t+1}, A_{t+1})) \\ &= \delta_t + \gamma \delta_{t+1} + \gamma^2 (G_{t+2} - Q(S_{t+2}, A_{t+2})) \tag{using recursion} \\ &= \delta_t + \gamma \delta_{t+1} + \gamma^2 \delta_{t+1} + \cdots + \gamma^{T-t-1} \delta_{T-1} + \gamma^{T-t}(G_T - Q(S_T, A_T)) \\ &= \delta_t + \gamma \delta_{t+1} + \gamma^2 \delta_{t+1} + \cdots + \gamma^{T-t-1} \delta_{T-1} + \gamma^{T-t}(0-0) \tag{terminal value} \\ &= \sum_{k=t}^{T-1}\gamma^{k-t}\delta_k \end{flalign}$$

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•ÞºcÔ°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$1ae30f5d-b25b-4dcb-800f-45c463641ec5¹depends_on_disabled_cellsÂ§runtimeÎµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$7d3be915-9092-4261-8435-dd546a7db144Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ(cum_max (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚšt†±°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$7d3be915-9092-4261-8435-dd546a7db144¹depends_on_disabled_cellsÂ§runtimeÎZ=µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$71774d5f-7841-403f-bc6b-1a0cbbb72d6dŠ¦queuedÂ¤logs§runningÂ¦output†¤body…¦prefixÙ3FiniteMDP{Float32, GridworldState, GridworldAction}¨elements™’¦states’…¦prefixÙ$Main.var"workspace#3".GridworldState¨elements›’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°78e123e4d06443c5Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡2ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°e3e6b18864c38362Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡3ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°7d75a915b81b9730Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡4ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°32586272439d3588Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡5ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°593769200b7ddf14Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡6ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°d7705072ebc67732Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡7ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid¯32fa797472e0a83Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡2ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°ef30e57ae60bdc38Ù!application/vnd.pluto.tree+object’ ’…¦prefix®GridworldState¨elements’’¡x’¡2ªtext/plain’¡y’¡2ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°74f49756a2864a57Ù!application/vnd.pluto.tree+object¤more’F’…¦prefix®GridworldState¨elements’’¡x’¢10ªtext/plain’¡y’¡7ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°91d5970141de4b2dÙ!application/vnd.pluto.tree+object¤type¥Array¬prefix_short ¨objectid°ee66261cf47f9401Ù!application/vnd.pluto.tree+object’§actions’…¦prefixÙ%Main.var"workspace#3".GridworldAction¨elements”’’…¦prefix¢Up¨elements¤type¦struct¬prefix_short¢Up¨objectid°ffffffff92511601Ù!application/vnd.pluto.tree+object’’…¦prefix¤Down¨elements¤type¦struct¬prefix_short¤Down¨objectid°ffffffff0a19a748Ù!application/vnd.pluto.tree+object’’…¦prefix¤Left¨elements¤type¦struct¬prefix_short¤Left¨objectid°ffffffff13951cc2Ù!application/vnd.pluto.tree+object’’…¦prefix¥Right¨elements¤type¦struct¬prefix_short¥Right¨objectid°ffffffffa8d2f4c6Ù!application/vnd.pluto.tree+object¤type¥Array¬prefix_short ¨objectid°952f6adeb23ade52Ù!application/vnd.pluto.tree+object’§rewards’…¦prefix§Float32¨elements’’’£0.0ªtext/plain’’¤-1.0ªtext/plain¤type¥Array¬prefix_short ¨objectid¯781de2e275c431bÙ!application/vnd.pluto.tree+object’£ptf’Ú½70Ã—2Ã—4Ã—70 Array{Float32, 4}: [:, :, 1, 1] = 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 2, 1] = 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 3, 1] = 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 4, 1] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 1, 2] = 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 2, 2] = 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 3, 2] = 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 4, 2] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 1, 3] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 2, 3] = 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 3, 3] = 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 4, 3] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ;;;; â€¦ [:, :, 1, 68] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 [:, :, 2, 68] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 3, 68] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 4, 68] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 [:, :, 1, 69] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 [:, :, 2, 69] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 [:, :, 3, 69] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 4, 69] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 [:, :, 1, 70] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 [:, :, 2, 70] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 [:, :, 3, 70] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 4, 70] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0ªtext/plain’®action_scratch’…¦prefix§Float32¨elements”’’§334.268ªtext/plain’’§332.964ªtext/plain’’¦333.09ªtext/plain’’§333.289ªtext/plain¤type¥Array¬prefix_short ¨objectid°b3ed763e003373d8Ù!application/vnd.pluto.tree+object’state_scratch’…¦prefix§Float32¨elements›’’§272.455ªtext/plain’’¦273.38ªtext/plain’’«0.000616613ªtext/plain’’ª4.5677f-41ªtext/plain’’«2.08591f-22ªtext/plain’’ª4.5677f-41ªtext/plain’’¨1.32f-43ªtext/plain’’£0.0ªtext/plain’ ’«-3.27565f35ªtext/plain¤more’G’§1.4f-43ªtext/plain¤type¥Array¬prefix_short ¨objectid°3d1f60899297466dÙ!application/vnd.pluto.tree+object’®reward_scratch’…¦prefix§Float32¨elements’’’§4.0f-45ªtext/plain’’£0.0ªtext/plain¤type¥Array¬prefix_short ¨objectid°2d531d9ef2febf74Ù!application/vnd.pluto.tree+object’«state_index’…¦prefixÙ1Dict{Main.var"workspace#3".GridworldState, Int64}¨elements›’’…¦prefix®GridworldState¨elements’’¡x’¡8ªtext/plain’¡y’¡5ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°14e5eae9a48c6749Ù!application/vnd.pluto.tree+object’¢54ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡6ªtext/plain’¡y’¡4ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°4e5052ac2b36c8beÙ!application/vnd.pluto.tree+object’¢39ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡7ªtext/plain’¡y’¡4ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°6d65389daed97014Ù!application/vnd.pluto.tree+object’¢46ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡8ªtext/plain’¡y’¡4ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°b85af438304886c5Ù!application/vnd.pluto.tree+object’¢53ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¢10ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°dad6dff35c9621ffÙ!application/vnd.pluto.tree+object’¢64ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡6ªtext/plain’¡y’¡7ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°4e4b90239eb3be65Ù!application/vnd.pluto.tree+object’¢42ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡8ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°6d43cd1ca99a553eÙ!application/vnd.pluto.tree+object’¢50ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡2ªtext/plain’¡y’¡3ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°166e372c47e8ffa6Ù!application/vnd.pluto.tree+object’¢10ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡5ªtext/plain’¡y’¡3ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°f8402269233868c7Ù!application/vnd.pluto.tree+object’¢31ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡8ªtext/plain’¡y’¡7ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°b08053c76dcd8072Ù!application/vnd.pluto.tree+object’¢56ªtext/plain¤more¤type¤Dict¬prefix_short¤Dict¨objectid°efe2304abefe6e4cÙ!application/vnd.pluto.tree+object’¬action_index’…¦prefixÙ2Dict{Main.var"workspace#3".GridworldAction, Int64}¨elements”’’…¦prefix¤Down¨elements¤type¦struct¬prefix_short¤Down¨objectid°ffffffff0a19a748Ù!application/vnd.pluto.tree+object’¡2ªtext/plain’’…¦prefix¤Left¨elements¤type¦struct¬prefix_short¤Left¨objectid°ffffffff13951cc2Ù!application/vnd.pluto.tree+object’¡3ªtext/plain’’…¦prefix¥Right¨elements¤type¦struct¬prefix_short¥Right¨objectid°ffffffffa8d2f4c6Ù!application/vnd.pluto.tree+object’¡4ªtext/plain’’…¦prefix¢Up¨elements¤type¦struct¬prefix_short¢Up¨objectid°ffffffff92511601Ù!application/vnd.pluto.tree+object’¡1ªtext/plain¤type¤Dict¬prefix_short¤Dict¨objectid°ea6ca163b4135382Ù!application/vnd.pluto.tree+object¤type¦struct¬prefix_short©FiniteMDP¨objectid°b5f665419f0d8ca4¤mimeÙ!application/vnd.pluto.tree+object¬rootassignee¼const windy_gridworld_mdp_dp²last_run_timestampËAÚš}´c°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$71774d5f-7841-403f-bc6b-1a0cbbb72d6d¹depends_on_disabled_cellsÂ§runtimeÎ_›µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$22c2213e-5b9b-410f-a0ef-8f1e3db3c532Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚk'

Figure 6.2

Performance of TD(0) and constant-Î± MC under batch training on the random walk task with 5 states

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•çÌbž°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$22c2213e-5b9b-410f-a0ef-8f1e3db3c532¹depends_on_disabled_cellsÂ§runtimeÎL§µpublished_object_keys‘Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/d3030aa42e1dd0c8¸depends_on_skipped_cellsÂ§erroredÂÙ$39470c74-e554-4f6c-919d-97bec1eec0f3Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙÍ

Adding king's move actions, the optimal policy can finish in 7 steps vs 15 for the original actions. What happens after adding a 9th action that causes no movement?

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•ÞºË%°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$39470c74-e554-4f6c-919d-97bec1eec0f3¹depends_on_disabled_cellsÂ§runtimeÎ\Áµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$9da5fd84-800d-4b3e-8627-e90ce8f20297Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ1show_grid_policy (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•èß‡'°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$9da5fd84-800d-4b3e-8627-e90ce8f20297¹depends_on_disabled_cellsÂ§runtimeÎgÞ8µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$415ea466-2038-48fe-9d24-39a90182f1ebŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ3monte_carlo_pred_V (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•âCð°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$415ea466-2038-48fe-9d24-39a90182f1eb¹depends_on_disabled_cellsÂ§runtimeÎ¶i„µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$0e488135-49e5-4e71-83b1-05d8e61f0510Š¦queuedÂ¤logs§runningÂ¦output†¤body…¦prefixÙ3FiniteMDP{Float32, GridworldState, GridworldAction}¨elements™’¦states’…¦prefixÙ$Main.var"workspace#3".GridworldState¨elements›’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°78e123e4d06443c5Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡2ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°e3e6b18864c38362Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡3ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°7d75a915b81b9730Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡4ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°32586272439d3588Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡5ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°593769200b7ddf14Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡6ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°d7705072ebc67732Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡1ªtext/plain’¡y’¡7ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid¯32fa797472e0a83Ù!application/vnd.pluto.tree+object’’…¦prefix®GridworldState¨elements’’¡x’¡2ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°ef30e57ae60bdc38Ù!application/vnd.pluto.tree+object’ ’…¦prefix®GridworldState¨elements’’¡x’¡2ªtext/plain’¡y’¡2ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°74f49756a2864a57Ù!application/vnd.pluto.tree+object¤more’F’…¦prefix®GridworldState¨elements’’¡x’¢10ªtext/plain’¡y’¡7ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°91d5970141de4b2dÙ!application/vnd.pluto.tree+object¤type¥Array¬prefix_short ¨objectid°6f5984a96b457d74Ù!application/vnd.pluto.tree+object’§actions’…¦prefixÙ%Main.var"workspace#3".GridworldAction¨elements™’’…¦prefix¢Up¨elements¤type¦struct¬prefix_short¢Up¨objectid°ffffffff92511601Ù!application/vnd.pluto.tree+object’’…¦prefix¤Down¨elements¤type¦struct¬prefix_short¤Down¨objectid°ffffffff0a19a748Ù!application/vnd.pluto.tree+object’’…¦prefix¤Left¨elements¤type¦struct¬prefix_short¤Left¨objectid°ffffffff13951cc2Ù!application/vnd.pluto.tree+object’’…¦prefix¥Right¨elements¤type¦struct¬prefix_short¥Right¨objectid°ffffffffa8d2f4c6Ù!application/vnd.pluto.tree+object’’…¦prefix§UpRight¨elements¤type¦struct¬prefix_short§UpRight¨objectid°ffffffff65dca132Ù!application/vnd.pluto.tree+object’’…¦prefix¦UpLeft¨elements¤type¦struct¬prefix_short¦UpLeft¨objectid°ffffffff68f3503eÙ!application/vnd.pluto.tree+object’’…¦prefix©DownRight¨elements¤type¦struct¬prefix_short©DownRight¨objectid°ffffffff97f641f9Ù!application/vnd.pluto.tree+object’’…¦prefix¨DownLeft¨elements¤type¦struct¬prefix_short¨DownLeft¨objectid°ffffffffd243dd41Ù!application/vnd.pluto.tree+object’ ’…¦prefix¤Stay¨elements¤type¦struct¬prefix_short¤Stay¨objectid°ffffffff40b55070Ù!application/vnd.pluto.tree+object¤type¥Array¬prefix_short ¨objectid°6fc895826c2337b2Ù!application/vnd.pluto.tree+object’§rewards’…¦prefix§Float32¨elements’’’£0.0ªtext/plain’’¤-1.0ªtext/plain¤type¥Array¬prefix_short ¨objectid°8ba99ab12ffe5417Ù!application/vnd.pluto.tree+object’£ptf’Ú ö70Ã—2Ã—9Ã—70 Array{Float32, 4}: [:, :, 1, 1] = 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 2, 1] = 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 3, 1] = 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 4, 1] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 5, 1] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 6, 1] = 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 7, 1] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 8, 1] = 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 9, 1] = 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 1, 2] = 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 2, 2] = 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 3, 2] = 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 4, 2] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 5, 2] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 6, 2] = 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 7, 2] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 8, 2] = 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 9, 2] = 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 1, 3] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 2, 3] = 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 3, 3] = 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 4, 3] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 5, 3] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 6, 3] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 7, 3] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 8, 3] = 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 9, 3] = 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ;;;; â€¦ [:, :, 1, 68] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 [:, :, 2, 68] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 3, 68] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 4, 68] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 [:, :, 5, 68] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 [:, :, 6, 68] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 7, 68] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 8, 68] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 9, 68] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 [:, :, 1, 69] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 [:, :, 2, 69] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 [:, :, 3, 69] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 4, 69] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 [:, :, 5, 69] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 [:, :, 6, 69] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 7, 69] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 [:, :, 8, 69] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 9, 69] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 [:, :, 1, 70] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 [:, :, 2, 70] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 [:, :, 3, 70] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 4, 70] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 [:, :, 5, 70] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 [:, :, 6, 70] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 7, 70] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 [:, :, 8, 70] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 [:, :, 9, 70] = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 â‹® 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0ªtext/plain’®action_scratch’…¦prefix§Float32¨elements™’’£1.0ªtext/plain’’£0.5ªtext/plain’’§1.0f-45ªtext/plain’’£0.0ªtext/plain’’§3.0f-45ªtext/plain’’£0.0ªtext/plain’’§4.0f-45ªtext/plain’’£0.0ªtext/plain’ ’£0.0ªtext/plain¤type¥Array¬prefix_short ¨objectid°1a6888bacd7c30e0Ù!application/vnd.pluto.tree+object’state_scratch’…¦prefix§Float32¨elements›’’§1.0f-45ªtext/plain’’£0.0ªtext/plain’’§1.0f-45ªtext/plain’’£0.0ªtext/plain’’§1.0f-45ªtext/plain’’£0.0ªtext/plain’’§1.0f-45ªtext/plain’’£0.0ªtext/plain’ ’§1.0f-45ªtext/plain¤more’G’£0.0ªtext/plain¤type¥Array¬prefix_short ¨objectid°2e9e2ed475e79827Ù!application/vnd.pluto.tree+object’®reward_scratch’…¦prefix§Float32¨elements’’’§1.0f-45ªtext/plain’’£0.0ªtext/plain¤type¥Array¬prefix_short ¨objectid°9d5f2048d3697c1eÙ!application/vnd.pluto.tree+object’«state_index’…¦prefixÙ1Dict{Main.var"workspace#3".GridworldState, Int64}¨elements›’’…¦prefix®GridworldState¨elements’’¡x’¡8ªtext/plain’¡y’¡5ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°14e5eae9a48c6749Ù!application/vnd.pluto.tree+object’¢54ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡6ªtext/plain’¡y’¡4ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°4e5052ac2b36c8beÙ!application/vnd.pluto.tree+object’¢39ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡7ªtext/plain’¡y’¡4ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°6d65389daed97014Ù!application/vnd.pluto.tree+object’¢46ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡8ªtext/plain’¡y’¡4ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°b85af438304886c5Ù!application/vnd.pluto.tree+object’¢53ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¢10ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°dad6dff35c9621ffÙ!application/vnd.pluto.tree+object’¢64ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡6ªtext/plain’¡y’¡7ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°4e4b90239eb3be65Ù!application/vnd.pluto.tree+object’¢42ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡8ªtext/plain’¡y’¡1ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°6d43cd1ca99a553eÙ!application/vnd.pluto.tree+object’¢50ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡2ªtext/plain’¡y’¡3ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°166e372c47e8ffa6Ù!application/vnd.pluto.tree+object’¢10ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡5ªtext/plain’¡y’¡3ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°f8402269233868c7Ù!application/vnd.pluto.tree+object’¢31ªtext/plain’’…¦prefix®GridworldState¨elements’’¡x’¡8ªtext/plain’¡y’¡7ªtext/plain¤type¦struct¬prefix_short®GridworldState¨objectid°b08053c76dcd8072Ù!application/vnd.pluto.tree+object’¢56ªtext/plain¤more¤type¤Dict¬prefix_short¤Dict¨objectid°b59e95c60e92f130Ù!application/vnd.pluto.tree+object’¬action_index’…¦prefixÙ2Dict{Main.var"workspace#3".GridworldAction, Int64}¨elements™’’…¦prefix¤Down¨elements¤type¦struct¬prefix_short¤Down¨objectid°ffffffff0a19a748Ù!application/vnd.pluto.tree+object’¡2ªtext/plain’’…¦prefix¤Left¨elements¤type¦struct¬prefix_short¤Left¨objectid°ffffffff13951cc2Ù!application/vnd.pluto.tree+object’¡3ªtext/plain’’…¦prefix§UpRight¨elements¤type¦struct¬prefix_short§UpRight¨objectid°ffffffff65dca132Ù!application/vnd.pluto.tree+object’¡5ªtext/plain’’…¦prefix©DownRight¨elements¤type¦struct¬prefix_short©DownRight¨objectid°ffffffff97f641f9Ù!application/vnd.pluto.tree+object’¡7ªtext/plain’’…¦prefix¨DownLeft¨elements¤type¦struct¬prefix_short¨DownLeft¨objectid°ffffffffd243dd41Ù!application/vnd.pluto.tree+object’¡8ªtext/plain’’…¦prefix¥Right¨elements¤type¦struct¬prefix_short¥Right¨objectid°ffffffffa8d2f4c6Ù!application/vnd.pluto.tree+object’¡4ªtext/plain’’…¦prefix¦UpLeft¨elements¤type¦struct¬prefix_short¦UpLeft¨objectid°ffffffff68f3503eÙ!application/vnd.pluto.tree+object’¡6ªtext/plain’’…¦prefix¤Stay¨elements¤type¦struct¬prefix_short¤Stay¨objectid°ffffffff40b55070Ù!application/vnd.pluto.tree+object’¡9ªtext/plain’’…¦prefix¢Up¨elements¤type¦struct¬prefix_short¢Up¨objectid°ffffffff92511601Ù!application/vnd.pluto.tree+object’¡1ªtext/plain¤type¤Dict¬prefix_short¤Dict¨objectid¯7a41f62b29e0578Ù!application/vnd.pluto.tree+object¤type¦struct¬prefix_short©FiniteMDP¨objectid°9e9f13d9855b28db¤mimeÙ!application/vnd.pluto.tree+object¬rootassignee¿const kingplus_gridworld_mdp_dp²last_run_timestampËAÚšŽ37°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$0e488135-49e5-4e71-83b1-05d8e61f0510¹depends_on_disabled_cellsÂ§runtimeÎ \µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$1f28280e-ba3b-4ca5-89e4-6ca4a90f5893Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚjç

Afterstate Value Iteration Results for Jack's Car Rental

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚš!Ñ>S°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$1f28280e-ba3b-4ca5-89e4-6ca4a90f5893¹depends_on_disabled_cellsÂ§runtimeÎ´†™@µpublished_object_keys‘Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/d8c715e8e34d7d99¸depends_on_skipped_cellsÂ§erroredÂÙ$6d9ae541-cf8c-4687-9f0a-f008944657e3Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ+figure_6_3 (generic function with 1 method)¤mimeªtext/plain¬rootassigneeÀ²last_run_timestampËAÚ•ééà°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$6d9ae541-cf8c-4687-9f0a-f008944657e3¹depends_on_disabled_cellsÂ§runtimeÎþnSµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$d4e39164-9833-4deb-84ca-22f49a1c33d8Š¦queuedÂ¤logs§runningÂ¦output†¤bodyÚq

Reference equations:

$$\begin{flalign} V(S_t) &\leftarrow V(S_t) + \alpha[R_{t+1} + \gamma V(S_{t+1}) - V(S_t)] \tag{6.2} \\ \delta_t &\doteq R_{t+1} + \gamma V(S_{t+1}) - V(S_t) \tag{6.5} \end{flalign}$$

Re-write equation (6.5) using the values known at time t. $V_t$ means the value function estimate at time $t$.

$$\delta_t \doteq R_{t+1} + \gamma V_t(S_{t+1}) - V_t(S_t)$$

Now equation (6.2) becomes

$$V_{t+1}(S_t) = V_t(S_t) + \alpha \delta_t$$

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ±GI°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$d4e39164-9833-4deb-84ca-22f49a1c33d8¹depends_on_disabled_cellsÂ§runtimeÎU’µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$f2115666-86ce-4c80-9eb7-490cc7a7715cŠ¦queuedÂ¤logs§runningÂ¦output†¤bodyÙ¾

With the original value initialization, the error passes through a minimum early on due to the symmetry of the value updates created by the initial value.

¤mime©text/html¬rootassigneeÀ²last_run_timestampËAÚ•Þ·éÉ°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$f2115666-86ce-4c80-9eb7-490cc7a7715c¹depends_on_disabled_cellsÂ§runtimeÎÊµpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂÙ$2155adfa-7a93-4960-950e-1b123da9eea4Š¦queuedÂ¤logs§runningÂ¦output†¤body…¦prefixÙ%Main.var"workspace#3".GridworldAction¨elements˜’’…¦prefix¢Up¨elements¤type¦struct¬prefix_short¢Up¨objectid°ffffffff92511601Ù!application/vnd.pluto.tree+object’’…¦prefix¤Down¨elements¤type¦struct¬prefix_short¤Down¨objectid°ffffffff0a19a748Ù!application/vnd.pluto.tree+object’’…¦prefix¤Left¨elements¤type¦struct¬prefix_short¤Left¨objectid°ffffffff13951cc2Ù!application/vnd.pluto.tree+object’’…¦prefix¥Right¨elements¤type¦struct¬prefix_short¥Right¨objectid°ffffffffa8d2f4c6Ù!application/vnd.pluto.tree+object’’…¦prefix§UpRight¨elements¤type¦struct¬prefix_short§UpRight¨objectid°ffffffff65dca132Ù!application/vnd.pluto.tree+object’’…¦prefix¦UpLeft¨elements¤type¦struct¬prefix_short¦UpLeft¨objectid°ffffffff68f3503eÙ!application/vnd.pluto.tree+object’’…¦prefix©DownRight¨elements¤type¦struct¬prefix_short©DownRight¨objectid°ffffffff97f641f9Ù!application/vnd.pluto.tree+object’’…¦prefix¨DownLeft¨elements¤type¦struct¬prefix_short¨DownLeft¨objectid°ffffffffd243dd41Ù!application/vnd.pluto.tree+object¤type¥Array¬prefix_short ¨objectid°d84fdc99910d1e41¤mimeÙ!application/vnd.pluto.tree+object¬rootassigneeÀ²last_run_timestampËAÚ•è4Oœ°persist_js_stateÂ·has_pluto_hook_featuresÂ§cell_idÙ$2155adfa-7a93-4960-950e-1b123da9eea4¹depends_on_disabled_cellsÂ§runtimeÍ"µpublished_object_keys¸depends_on_skipped_cellsÂ§erroredÂ±cell_dependenciesÞáÙ$8ddf6b9d-d76d-401f-96ad-2a0b5c114fa4„´precedence_heuristic §cell_idÙ$8ddf6b9d-d76d-401f-96ad-2a0b5c114fa4´downstream_cells_mapºcreate_noisy_gridworld_mdp‘Ù$297f1606-4ec2-4075-9f81-926dc517b76f²upstream_cells_map†¦length¡:©FiniteMDP‘Ù$d7566d1b-8938-4e2c-8c54-124f790e72ae§Float32¥zeros¦MDP_TD‘Ù$3e767962-7339-4f35-a039-b5521a098ed5Ù$5290ae65-6f56-4849-a842-fe347315c6dc„´precedence_heuristic §cell_idÙ$5290ae65-6f56-4849-a842-fe347315c6dc´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$b3d4117f-7db4-43a6-8427-c08f3542d71f„´precedence_heuristic §cell_idÙ$b3d4117f-7db4-43a6-8427-c08f3542d71f´downstream_cells_map§poisson’Ù$ad03500a-bd42-4216-a9cb-3f923152af79Ù$2455742f-dc18-4d6b-9f58-5666adac6919²upstream_cells_map†£exp¡-¡^¡/©factorial¡*Ù$3ed12c33-ab0a-49b1-b9e7-c4305ba35767„´precedence_heuristic §cell_idÙ$3ed12c33-ab0a-49b1-b9e7-c4305ba35767´downstream_cells_map©init_step‘Ù$61bbf9db-49a0-4709-83f4-44f228be09c0²upstream_cells_map…sample_action‘Ù$7a5ff8f7-70d4-46f1-a4a7-bbfcec4f6e3f¦Matrix¦MDP_TD‘Ù$3e767962-7339-4f35-a039-b5521a098ed5¨Function¤RealÙ$209881b3-3ac8-490e-97bd-fa5ae24a39f5„´precedence_heuristic §cell_idÙ$209881b3-3ac8-490e-97bd-fa5ae24a39f5´downstream_cells_mapupdate_value!‘Ù$3f3ebc9b-b070-4d73-8be9-823b399c664c²upstream_cells_map¡:¤zero£max¦isless¨Function¦Vector¦length¡-¡+£TD0‘Ù$620a6426-cb29-4010-997b-aa4f9d5f8fb0¤last¦MDP_TD‘Ù$3e767962-7339-4f35-a039-b5521a098ed5ªcalc_error‘Ù$3d8b1ccd-9bb3-42f2-a77a-6afdb72c1ff8¡*AbstractFloatÙ$6e06bd39-486f-425a-bbca-bf363b58988c„´precedence_heuristic §cell_idÙ$6e06bd39-486f-425a-bbca-bf363b58988c´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$e039a5be-4b59-4023-be97-2d1de970be27„´precedence_heuristic §cell_idÙ$e039a5be-4b59-4023-be97-2d1de970be27´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$2786101e-d365-4d6a-8de7-b9794499efb4„´precedence_heuristic §cell_idÙ$2786101e-d365-4d6a-8de7-b9794499efb4´downstream_cells_map«example_6_2‘Ù$9db7a268-1e6d-4366-a0ec-ebf54916d3b0²upstream_cells_mapÞ¦string¨make_mrp‘Ù$4ddcd409-c31c-444c-8fcf-7cc45b68d93b¤sqrt®Iterators.take§eachcol¤@htl²make_random_policy‘Ù$8e34202a-f841-4464-9017-cd50194f7987²tabular_TD0_pred_V‘Ù$eb735ead-978b-409c-8990-b5fa7a027ebf©eachindex§scatter¡/¡^·HypertextLiteral.Result¤last¢==¤mean¡:©Iterators·HypertextLiteral.Bypass§collect¢|>¸HypertextLiteral.content²monte_carlo_pred_V‘Ù$415ea466-2038-48fe-9d24-39a90182f1eb¡-¤plot°HypertextLiteral‘Ù$639840dc-976a-4e5c-987f-a92afb2d99d8¡+¦LayoutÙ$14b456f9-5fd1-4340-a3c7-ab9b91b4e3e0„´precedence_heuristic §cell_idÙ$14b456f9-5fd1-4340-a3c7-ab9b91b4e3e0´downstream_cells_map€²upstream_cells_mapƒ¤Base®Base.Docs.HTML©@html_strÙ$ec285c96-4a75-4af6-8898-ec3176fa34c6„´precedence_heuristic §cell_idÙ$ec285c96-4a75-4af6-8898-ec3176fa34c6´downstream_cells_map´make_windy_gridworld—Ù$ab331778-f892-4690-8bb3-26464e3fc05fÙ$dda222ef-8178-40bb-bf20-d242924c4fabÙ$dee6b500-0ba1-4bbc-b217-cbb9ad47ad06Ù$4ddc7d99-0b79-4689-bd93-8798b105c0a2Ù$64b210e8-223f-41f7-a6b7-8af6183ddf87Ù$95245673-2c29-401e-bb4b-a39dc8172297Ù$07c57f37-22be-4c39-8279-d80addcea0c5²upstream_cells_map‹¬rook_actions‘Ù$e19db54c-4b3c-42d1-b016-9620daf89bfbªapply_wind‘Ù$e19db54c-4b3c-42d1-b016-9620daf89bfb¡:¥Int64¯GridworldAction‘Ù$e19db54c-4b3c-42d1-b016-9620daf89bfb©wind_vals‘Ù$e19db54c-4b3c-42d1-b016-9620daf89bfb¥clamp¢==¤move“Ù$e19db54c-4b3c-42d1-b016-9620daf89bfbÙ$031e1106-7408-4c7e-b78e-b713c19123d1Ù$e9359ca3-4d11-4365-bc6e-7babc6fcc7de®GridworldState‘Ù$e19db54c-4b3c-42d1-b016-9620daf89bfb¦MDP_TD‘Ù$3e767962-7339-4f35-a039-b5521a098ed5Ù$cafedde8-be94-4697-a511-510a5fea0155„´precedence_heuristic §cell_idÙ$cafedde8-be94-4697-a511-510a5fea0155´downstream_cells_map€²upstream_cells_mapƒ¬fig_6_3_load‘Ù$21fbdc3b-4444-4f56-9934-fb58e184d685ªfigure_6_3‘Ù$6d9ae541-cf8c-4687-9f0a-f008944657e3ªcliffworld‘Ù$6faa3015-3ac4-44af-a78c-10b175822441Ù$d526a3a4-63cc-4f94-8f55-98c9a4a9d134„´precedence_heuristic §cell_idÙ$d526a3a4-63cc-4f94-8f55-98c9a4a9d134´downstream_cells_map±double_q_learning“Ù$69eedbfd-396f-4461-b7a1-c36abc094581Ù$b5e06f59-33b5-414e-9a81-43e8abd07aa3Ù$33d69db9-fa2b-40a3-bbed-21d5fd60f302²upstream_cells_mapAbstractFloat¤zero¥firstµdouble_expected_sarsa‘Ù$3756a3f8-18e8-4d62-afa1-cfeb4183820c£one´create_greedy_policy‘Ù$84a71bf8-0d66-42cd-ac7b-589d63a16eda¡/¦Matrix³make_greedy_policy!“Ù$cb07a6a5-c50a-4900-9e5b-a17dc7ee5710Ù$685a7ba3-0f94-4663-a68a-73fa03bd9445Ù$c4919d14-8cba-43e6-9369-efc52bcb9b23½initialize_state_action_value‘Ù$c5d32889-634b-4b00-8ba7-0d1ecaf94f05¦MDP_TD‘Ù$3e767962-7339-4f35-a039-b5521a098ed5¶make_Ïµ_greedy_policy!‘Ù$6b496582-cc0e-4195-87ef-94792b0fff54·create_Ïµ_greedy_policy‘Ù$4d4577b5-3753-450d-a247-ebd8c3e8f799Ù$02f34da1-551f-4ce5-a588-7f3a14afd716„´precedence_heuristic §cell_idÙ$02f34da1-551f-4ce5-a588-7f3a14afd716´downstream_cells_map¨wind_var‘Ù$aa0791a5-8cf1-499b-9900-4d0c59be808c²upstream_cells_map€Ù$f11dca8f-5557-49fc-9720-35034eadba57„´precedence_heuristic §cell_idÙ$f11dca8f-5557-49fc-9720-35034eadba57´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$4ddc7d99-0b79-4689-bd93-8798b105c0a2„´precedence_heuristic §cell_idÙ$4ddc7d99-0b79-4689-bd93-8798b105c0a2´downstream_cells_map´stochastic_gridworld’Ù$ed4e863b-22dd-4d2b-88d0-b3a56d6713b7Ù$1e45a661-c2e1-40c2-b27b-5f80f95efdab²upstream_cells_mapƒ¬king_actions‘Ù$031e1106-7408-4c7e-b78e-b713c19123d1´make_windy_gridworld‘Ù$ec285c96-4a75-4af6-8898-ec3176fa34c6¯stochastic_wind‘Ù$aa0791a5-8cf1-499b-9900-4d0c59be808cÙ$bd1029f9-d6a8-4c68-98cd-8af94297b521„´precedence_heuristic §cell_idÙ$bd1029f9-d6a8-4c68-98cd-8af94297b521´downstream_cells_map©plot_path•Ù$75bfe913-8757-4789-b708-7d400c225218Ù$d299d800-a64e-4ba2-9603-efa833343405Ù$c34678f6-53bb-4f2a-96f0-a7b16f894dddÙ$6bffb08c-704a-4b7c-bfce-b3d099cf35c0Ù$84584793-8274-4aa1-854f-b167c7434548²upstream_cells_map²make_random_policy‘Ù$8e34202a-f841-4464-9017-cd50194f7987Ù$cb07a6a5-c50a-4900-9e5b-a17dc7ee5710„´precedence_heuristic §cell_idÙ$cb07a6a5-c50a-4900-9e5b-a17dc7ee5710´downstream_cells_map³make_greedy_policy!“Ù$84a71bf8-0d66-42cd-ac7b-589d63a16edaÙ$d526a3a4-63cc-4f94-8f55-98c9a4a9d134Ù$4019c974-dcaa-46c8-ac90-e6566a376ea1²upstream_cells_map§extrema¤zero£exp®AbstractVector£sum£one¤Real¦length¡-¡/¡*¢==£absÙ$ddf3bb61-16c9-48c4-95d4-263260309762„´precedence_heuristic §cell_idÙ$ddf3bb61-16c9-48c4-95d4-263260309762´downstream_cells_map¬exercise_6_5’Ù$e8f94345-9ad5-48d4-8709-d796fb55db3fÙ$a72d07bf-e337-4bd4-af5c-44d74d163b6b²upstream_cells_mapÞ¦string¡:©Iterators¨make_mrp‘Ù$4ddcd409-c31c-444c-8fcf-7cc45b68d93b¤sqrt§collect¢|>®Iterators.take§eachcol²make_random_policy‘Ù$8e34202a-f841-4464-9017-cd50194f7987²tabular_TD0_pred_V‘Ù$eb735ead-978b-409c-8990-b5fa7a027ebf¡-§scatter¤plot¡/¡^¡+¤last¦Layout¤meanÙ$d7566d1b-8938-4e2c-8c54-124f790e72ae„´precedence_heuristic §cell_idÙ$d7566d1b-8938-4e2c-8c54-124f790e72ae´downstream_cells_map‚©FiniteMDP˜Ù$c4919d14-8cba-43e6-9369-efc52bcb9b23Ù$95245673-2c29-401e-bb4b-a39dc8172297Ù$07c57f37-22be-4c39-8279-d80addcea0c5Ù$7ac99619-5232-4db8-8553-d79ea5415d29Ù$8ddf6b9d-d76d-401f-96ad-2a0b5c114fa4Ù$dea61907-d4fb-492d-b2bb-c037c7f785cbÙ$3134e913-1e86-495d-a558-c3ec4828bf7bÙ$2455742f-dc18-4d6b-9f58-5666adac6919«CompleteMDP–Ù$393cd9d2-dd97-496e-b260-ec6e8b1c13b5Ù$d7566d1b-8938-4e2c-8c54-124f790e72aeÙ$0748902c-ffc0-4634-9a1b-e642b3dfb77bÙ$4019c974-dcaa-46c8-ac90-e6566a376ea1Ù$30e663da-282c-42ff-8171-dbe3c5c467c6Ù$7ed07ddc-1c63-4ce7-bfd3-6da54304d297²upstream_cells_mapŒ¤Dict£zip¦Vector¤Real¥Int64£new©eachindex¦length«CompleteMDP‘Ù$d7566d1b-8938-4e2c-8c54-124f790e72ae¡+¥undef¥ArrayÙ$42799973-9884-4a0e-b29a-039890e92d21„´precedence_heuristic §cell_idÙ$42799973-9884-4a0e-b29a-039890e92d21´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$187fc682-2282-46ca-b988-c9de438f36fd„´precedence_heuristic §cell_idÙ$187fc682-2282-46ca-b988-c9de438f36fd´downstream_cells_mapªparams_6_2‘Ù$22c2213e-5b9b-410f-a0ef-8f1e3db3c532²upstream_cells_mapŽ§@md_str¤Core¡:§PlutoUI‘Ù$639840dc-976a-4e5c-987f-a92afb2d99d8¨Base.get¥@bind¦Slider¤Base«PlutoRunner·PlutoRunner.create_bond§confirm¯Core.applicable¯PlutoUI.combine¨getindexÙ$8fe856ec-5f0a-4483-bb7d-3f6fe270b6f3„´precedence_heuristic §cell_idÙ$8fe856ec-5f0a-4483-bb7d-3f6fe270b6f3´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$8e15f4b5-0dc7-47a5-9477-9f4d8807b331„´precedence_heuristic §cell_idÙ$8e15f4b5-0dc7-47a5-9477-9f4d8807b331´downstream_cells_map»stochastic_gridworld_mdp_dp‘Ù$d299d800-a64e-4ba2-9603-efa833343405²upstream_cells_map„¬king_actions‘Ù$031e1106-7408-4c7e-b78e-b713c19123d1©wind_vals‘Ù$e19db54c-4b3c-42d1-b016-9620daf89bfb®GridworldState‘Ù$e19db54c-4b3c-42d1-b016-9620daf89bfb¿create_stochastic_gridworld_mdp‘Ù$07c57f37-22be-4c39-8279-d80addcea0c5Ù$9d01c0ef-6313-4091-b444-3e9765aba90c„´precedence_heuristic §cell_idÙ$9d01c0ef-6313-4091-b444-3e9765aba90c´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$62a9a36a-bedb-4f5a-80a4-2d4111a65c12„´precedence_heuristic §cell_idÙ$62a9a36a-bedb-4f5a-80a4-2d4111a65c12´downstream_cells_map€²upstream_cells_mapˆ§@md_strBase.getindex¤Base·HypertextLiteral.Bypass·HypertextLiteral.Result°HypertextLiteral‘Ù$639840dc-976a-4e5c-987f-a92afb2d99d8¸HypertextLiteral.content¤@htlÙ$2651af2d-56a8-4f7e-a56a-45cabd665c72„´precedence_heuristic §cell_idÙ$2651af2d-56a8-4f7e-a56a-45cabd665c72´downstream_cells_map€²upstream_cells_map‚²max_visual_params2‘Ù$0163763b-a15f-447e-b3d2-32d4bf9d2605»max_bias_visualization_comp‘Ù$3f4f078a-9fc4-4b02-b499-a805fd5f1071Ù$620a6426-cb29-4010-997b-aa4f9d5f8fb0„´precedence_heuristic §cell_idÙ$620a6426-cb29-4010-997b-aa4f9d5f8fb0´downstream_cells_mapƒ«BatchMethod’Ù$620a6426-cb29-4010-997b-aa4f9d5f8fb0Ù$3f3ebc9b-b070-4d73-8be9-823b399c664c¢MC’Ù$72b4d8d5-464c-4561-8c69-28ef3f59630bÙ$1e3d231a-4065-48ce-a74e-018066fb232a£TD0“Ù$209881b3-3ac8-490e-97bd-fa5ae24a39f5Ù$3f3ebc9b-b070-4d73-8be9-823b399c664cÙ$1e3d231a-4065-48ce-a74e-018066fb232a²upstream_cells_map«BatchMethod‘Ù$620a6426-cb29-4010-997b-aa4f9d5f8fb0Ù$889611fb-7dac-4769-9251-9a90e3a1422f„´precedence_heuristic §cell_idÙ$889611fb-7dac-4769-9251-9a90e3a1422f´downstream_cells_mapªstatestyle‘Ù$902738c3-2f7b-49cb-8580-29359c857027²upstream_cells_map€Ù$5455fc97-55cb-4b0e-a3be-9433ccc96fc0„´precedence_heuristic §cell_idÙ$5455fc97-55cb-4b0e-a3be-9433ccc96fc0´downstream_cells_mapƒ§nstates“Ù$e4c6456c-867d-4ade-a3c8-310c1e065f14Ù$9db7a268-1e6d-4366-a0ec-ebf54916d3b0Ù$4b0d96d0-25d1-4fed-b105-c65fa2883a61¥delay‘Ù$53145cc2-784c-468b-8e91-9bb7866db218©start_mrp‘Ù$12c5efe4-d64d-4b82-877c-29b0e537fee6²upstream_cells_mapŒ§@md_str¤Core¡:¨Base.get¥@bind¦Slider¤Base«PlutoRunner·PlutoRunner.create_bond¯Core.applicable¦Button¨getindexÙ$24a441c8-7aaf-4642-b245-5e1201456d67„´precedence_heuristic §cell_idÙ$24a441c8-7aaf-4642-b245-5e1201456d67´downstream_cells_map¬check_policy“Ù$eb735ead-978b-409c-8990-b5fa7a027ebfÙ$415ea466-2038-48fe-9d24-39a90182f1ebÙ$3f3ebc9b-b070-4d73-8be9-823b399c664c²upstream_cells_mapŒºMain.Base.inferencebarrier§@assert¤size§nothing¦length¤Main¥throw®AssertionError¦Matrix¢==¦MDP_TD‘Ù$3e767962-7339-4f35-a039-b5521a098ed5AbstractFloatÙ$1e45a661-c2e1-40c2-b27b-5f80f95efdab„´precedence_heuristic §cell_idÙ$1e45a661-c2e1-40c2-b27b-5f80f95efdab´downstream_cells_map€²upstream_cells_map…³king_action_display‘Ù$cdedd35e-52b8-40a5-938d-2d36f6f93217ªq_learning‘Ù$2034fd1e-5171-4eda-85d5-2de62d7a1e8b³display_king_policy‘Ù$9651f823-e1cd-4e6e-9ce0-be9ea1c3f0a4´stochastic_gridworld‘Ù$4ddc7d99-0b79-4689-bd93-8798b105c0a2»show_gridworld_policy_value‘Ù$c34678f6-53bb-4f2a-96f0-a7b16f894dddÙ$21fbdc3b-4444-4f56-9934-fb58e184d685„´precedence_heuristic §cell_idÙ$21fbdc3b-4444-4f56-9934-fb58e184d685´downstream_cells_map¬fig_6_3_load‘Ù$cafedde8-be94-4697-a511-510a5fea0155²upstream_cells_mapŠ¤Core§@md_str¤Base·PlutoRunner.create_bond«PlutoRunner¨CheckBox¯Core.applicable¥@bind¨Base.get¨getindexÙ$30e663da-282c-42ff-8171-dbe3c5c467c6„´precedence_heuristic §cell_idÙ$30e663da-282c-42ff-8171-dbe3c5c467c6´downstream_cells_map´makepolicyvalueplots’Ù$bb085f2e-83cb-45b2-adf6-c07da892d6e1Ù$1f28280e-ba3b-4ca5-89e4-6ca4a90f5893²upstream_cells_mapÞ¸LaTeXStrings.latexstring¡:¦@L_str¨relayout§Integer¦Vector¤Real³makepolicyvaluemaps‘Ù$7ed07ddc-1c63-4ce7-bfd3-6da54304d297¡-¤plot¨latexify§heatmap¤attr«CompleteMDP‘Ù$d7566d1b-8938-4e2c-8c54-124f790e72ae¦Matrix¬LaTeXStrings‘Ù$639840dc-976a-4e5c-987f-a92afb2d99d8¦LayoutÙ$9651f823-e1cd-4e6e-9ce0-be9ea1c3f0a4„´precedence_heuristic §cell_idÙ$9651f823-e1cd-4e6e-9ce0-be9ea1c3f0a4´downstream_cells_map³display_king_policy•Ù$f0f9d3d5-e76a-4472-bfb1-da29d73a7916Ù$dee6b500-0ba1-4bbc-b217-cbb9ad47ad06Ù$ed4e863b-22dd-4d2b-88d0-b3a56d6713b7Ù$1115f3ec-f4b2-4fba-bd5e-321a63b10a6dÙ$1e45a661-c2e1-40c2-b27b-5f80f95efdab²upstream_cells_map‡Ù HypertextLiteral.attribute_value·HypertextLiteral.Bypass·HypertextLiteral.Result°HypertextLiteral‘Ù$639840dc-976a-4e5c-987f-a92afb2d99d8¤@htlAbstractFloat¦VectorÙ$84a71bf8-0d66-42cd-ac7b-589d63a16eda„´precedence_heuristic §cell_idÙ$84a71bf8-0d66-42cd-ac7b-589d63a16eda´downstream_cells_map´create_greedy_policy”Ù$6bffb08c-704a-4b7c-bfce-b3d099cf35c0Ù$84584793-8274-4aa1-854f-b167c7434548Ù$3756a3f8-18e8-4d62-afa1-cfeb4183820cÙ$d526a3a4-63cc-4f94-8f55-98c9a4a9d134²upstream_cells_map‡¤Real¡:¦Matrix¥zeros³make_greedy_policy!“Ù$cb07a6a5-c50a-4900-9e5b-a17dc7ee5710Ù$685a7ba3-0f94-4663-a68a-73fa03bd9445Ù$c4919d14-8cba-43e6-9369-efc52bcb9b23¤size¤copyÙ$c9f7646a-ec01-4d90-9215-5027b7c1c885„´precedence_heuristic §cell_idÙ$c9f7646a-ec01-4d90-9215-5027b7c1c885´downstream_cells_map¦Î±_6_8‘Ù$b5e06f59-33b5-414e-9a81-43e8abd07aa3²upstream_cells_map‹§@md_str¤Core¡:¨Base.get¥@bind¦Slider¤Base«PlutoRunner·PlutoRunner.create_bond¯Core.applicable¨getindexÙ$8e34202a-f841-4464-9017-cd50194f7987„´precedence_heuristic §cell_idÙ$8e34202a-f841-4464-9017-cd50194f7987´downstream_cells_map²make_random_policy–Ù$7035c082-6e50-4df5-919f-5f09d2011b4aÙ$64fe8336-d1c2-41fe-a522-1b6f63260fc9Ù$2786101e-d365-4d6a-8de7-b9794499efb4Ù$ddf3bb61-16c9-48c4-95d4-263260309762Ù$1e3d231a-4065-48ce-a74e-018066fb232aÙ$bd1029f9-d6a8-4c68-98cd-8af94297b521²upstream_cells_map…¦length¡/¤ones¦MDP_TD‘Ù$3e767962-7339-4f35-a039-b5521a098ed5AbstractFloatÙ$95245673-2c29-401e-bb4b-a39dc8172297„´precedence_heuristic §cell_idÙ$95245673-2c29-401e-bb4b-a39dc8172297´downstream_cells_map´create_gridworld_mdp”Ù$d299d800-a64e-4ba2-9603-efa833343405Ù$71774d5f-7841-403f-bc6b-1a0cbbb72d6dÙ$2f4e2da2-b1a1-41b1-8904-39b59f426da4Ù$0e488135-49e5-4e71-83b1-05d8e61f0510²upstream_cells_mapˆ¦lengthªapply_wind‘Ù$e19db54c-4b3c-42d1-b016-9620daf89bfb¡:´make_windy_gridworld‘Ù$ec285c96-4a75-4af6-8898-ec3176fa34c6©FiniteMDP‘Ù$d7566d1b-8938-4e2c-8c54-124f790e72ae§Float32©wind_vals‘Ù$e19db54c-4b3c-42d1-b016-9620daf89bfb¥zerosÙ$c34678f6-53bb-4f2a-96f0-a7b16f894ddd„´precedence_heuristic §cell_idÙ$c34678f6-53bb-4f2a-96f0-a7b16f894ddd´downstream_cells_map»show_gridworld_policy_value•Ù$897fde24-9a4a-465e-96f2-dd9e8baab294Ù$1115f3ec-f4b2-4fba-bd5e-321a63b10a6dÙ$1e45a661-c2e1-40c2-b27b-5f80f95efdabÙ$b5e06f59-33b5-414e-9a81-43e8abd07aa3Ù$33d69db9-fa2b-40a3-bbed-21d5fd60f302²upstream_cells_mapŽ¡:°show_grid_policy‘Ù$9da5fd84-800d-4b3e-8627-e90ce8f20297·HypertextLiteral.Bypass¸HypertextLiteral.content©plot_path’Ù$9f28772c-9afe-4253-ab3b-055b0f48be6eÙ$bd1029f9-d6a8-4c68-98cd-8af94297b521¤rand¦String¤@htl³display_rook_policy‘Ù$4556cf44-4a1c-4ca4-bfb8-4841301a2ce6¯show_grid_value’Ù$8bc54c94-9c92-4904-b3a6-13ff3f0110bbÙ$678cad7a-1abb-4fcc-91ba-b5abcbb914cb³rook_action_display‘Ù$500d8dd4-fc53-4021-b797-114224ca4deb°HypertextLiteral‘Ù$639840dc-976a-4e5c-987f-a92afb2d99d8©wind_vals‘Ù$e19db54c-4b3c-42d1-b016-9620daf89bfb·HypertextLiteral.ResultÙ$e4e80015-40ce-4f8a-aac7-4a9584da4baa„´precedence_heuristic §cell_idÙ$e4e80015-40ce-4f8a-aac7-4a9584da4baa´downstream_cells_map€²upstream_cells_map‚«example_6_8‘Ù$33d69db9-fa2b-40a3-bbed-21d5fd60f302«ex_6_8_load‘Ù$d83ff60f-8973-4dc1-9358-5ad109ea5490Ù$64fe8336-d1c2-41fe-a522-1b6f63260fc9„´precedence_heuristic §cell_idÙ$64fe8336-d1c2-41fe-a522-1b6f63260fc9´downstream_cells_map¦Ï€_mrp‘Ù$12c5efe4-d64d-4b82-877c-29b0e537fee6²upstream_cells_map‚§mrp_6_2‘Ù$4b0d96d0-25d1-4fed-b105-c65fa2883a61²make_random_policy‘Ù$8e34202a-f841-4464-9017-cd50194f7987Ù$dea61907-d4fb-492d-b2bb-c037c7f785cb„´precedence_heuristic §cell_idÙ$dea61907-d4fb-492d-b2bb-c037c7f785cb´downstream_cells_map¶bellman_optimal_value!‘Ù$8787a5fd-d0ab-46b5-a7df-e7bc103a7378²upstream_cells_mapÞ¤zero©@fastmath§typemin¶Base.FastMath.sub_fast¦isless©@inbounds§nothing¦Vector¡<¶Base.FastMath.max_fast¯Base.simd_index©eachindex¤Real¥@simd¶Base.FastMath.abs_fast®julia.simdloop¶Base.FastMath.div_fast£eps¤BaseµBase.simd_outer_range©enumerate©FiniteMDP‘Ù$d7566d1b-8938-4e2c-8c54-124f790e72ae¶Base.FastMath.add_fast¶Base.simd_inner_length¡+¶Base.FastMath.mul_fastÙ$678cad7a-1abb-4fcc-91ba-b5abcbb914cb„´precedence_heuristic §cell_idÙ$678cad7a-1abb-4fcc-91ba-b5abcbb914cb´downstream_cells_map¯show_grid_value’Ù$d299d800-a64e-4ba2-9603-efa833343405Ù$c34678f6-53bb-4f2a-96f0-a7b16f894ddd²upstream_cells_mapÞ³king_action_display‘Ù$cdedd35e-52b8-40a5-938d-2d36f6f93217©findfirst¡:·HypertextLiteral.Bypass¸HypertextLiteral.content©mapreduce¤@htl¦Vector©eachindex¡-¤HTMLÙ HypertextLiteral.attribute_value·HypertextLiteral.Result°HypertextLiteral‘Ù$639840dc-976a-4e5c-987f-a92afb2d99d8¡+¹HypertextLiteral.StyleTag¡*¥round§maximumÙ$d299d800-a64e-4ba2-9603-efa833343405„´precedence_heuristic §cell_idÙ$d299d800-a64e-4ba2-9603-efa833343405´downstream_cells_map«example_6_5”Ù$04a0be81-ee5f-4eeb-963a-ad930392d50bÙ$f0f9d3d5-e76a-4472-bfb1-da29d73a7916Ù$dee6b500-0ba1-4bbc-b217-cbb9ad47ad06Ù$ed4e863b-22dd-4d2b-88d0-b3a56d6713b7²upstream_cells_mapÞ°show_grid_policy‘Ù$9da5fd84-800d-4b3e-8627-e90ce8f20297¥sarsa‘Ù$61bbf9db-49a0-4709-83f4-44f228be09c0¯windy_gridworld‘Ù$ab331778-f892-4690-8bb3-26464e3fc05f·begin_value_iteration_v“Ù$e947f86e-8dc3-4ce7-a9d4-0a7b675a9fa9Ù$4019c974-dcaa-46c8-ac90-e6566a376ea1Ù$3134e913-1e86-495d-a558-c3ec4828bf7b¦String¤@htl¦length³display_rook_policy‘Ù$4556cf44-4a1c-4ca4-bfb8-4841301a2ce6§scatter³rook_action_display‘Ù$500d8dd4-fc53-4021-b797-114224ca4deb·HypertextLiteral.Result¤fill´create_gridworld_mdp’Ù$95245673-2c29-401e-bb4b-a39dc8172297Ù$7ac99619-5232-4db8-8553-d79ea5415d29ªrunepisode’Ù$bfe71b40-3157-47df-8494-67f8eb8e4e93Ù$7035c082-6e50-4df5-919f-5f09d2011b4a¡:·HypertextLiteral.Bypass©plot_path’Ù$9f28772c-9afe-4253-ab3b-055b0f48be6eÙ$bd1029f9-d6a8-4c68-98cd-8af94297b521¸HypertextLiteral.content¤rand£end¦cumsum¡-¯show_grid_value’Ù$8bc54c94-9c92-4904-b3a6-13ff3f0110bbÙ$678cad7a-1abb-4fcc-91ba-b5abcbb914cb¤plot°HypertextLiteral‘Ù$639840dc-976a-4e5c-987f-a92afb2d99d8¡+¤attr©wind_vals‘Ù$e19db54c-4b3c-42d1-b016-9620daf89bfb¦Layout»stochastic_gridworld_mdp_dp‘Ù$8e15f4b5-0dc7-47a5-9477-9f4d8807b331Ù$c5718459-2323-4615-b2c4-f92a0fa189d9„´precedence_heuristic §cell_idÙ$c5718459-2323-4615-b2c4-f92a0fa189d9´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$c306867b-f137-44f2-97dd-3d10c226ca5c„´precedence_heuristic §cell_idÙ$c306867b-f137-44f2-97dd-3d10c226ca5c´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$a4c4d5f2-d76d-425e-b8c9-9047fe53c4f0„´precedence_heuristic §cell_idÙ$a4c4d5f2-d76d-425e-b8c9-9047fe53c4f0´downstream_cells_map€²upstream_cells_map‚ºgridworld_Q_vs_sarsa_solve‘Ù$6bffb08c-704a-4b7c-bfce-b3d099cf35c0ªcliffworld‘Ù$6faa3015-3ac4-44af-a78c-10b175822441Ù$410abe1d-04a6-4434-9abf-0d29dd6498e6„´precedence_heuristic §cell_idÙ$410abe1d-04a6-4434-9abf-0d29dd6498e6´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$aa0791a5-8cf1-499b-9900-4d0c59be808c„´precedence_heuristic §cell_idÙ$aa0791a5-8cf1-499b-9900-4d0c59be808c´downstream_cells_map¯stochastic_wind‘Ù$4ddc7d99-0b79-4689-bd93-8798b105c0a2²upstream_cells_map„¨wind_var‘Ù$02f34da1-551f-4ce5-a588-7f3a14afd716¡+¢==¤randÙ$510761f6-66c7-4faf-937b-e1422ec829a6„´precedence_heuristic §cell_idÙ$510761f6-66c7-4faf-937b-e1422ec829a6´downstream_cells_map€²upstream_cells_map¤HTMLÙ$0b9c6dbd-4eb3-4167-886e-64db9ec7ff04„´precedence_heuristic §cell_idÙ$0b9c6dbd-4eb3-4167-886e-64db9ec7ff04´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$a9dda9b5-f568-481c-9e8f-9bb887468775„´precedence_heuristic §cell_idÙ$a9dda9b5-f568-481c-9e8f-9bb887468775´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$ad03500a-bd42-4216-a9cb-3f923152af79„´precedence_heuristic §cell_idÙ$ad03500a-bd42-4216-a9cb-3f923152af79´downstream_cells_mapÙ create_car_rental_afterstate_mdp‘Ù$7de9b6a4-49ce-4dc3-9d5b-cecfcb98bba1²upstream_cells_mapÞ"¤Dict¤zero£sum¡>§poisson‘Ù$b3d4117f-7db4-43a6-8427-c08f3542d71f¦isless£zip¤Real¦length¡<©eachindex£min§Float32©intersect§setdiff¢==£abs«@NamedTuple³FiniteAfterstateMDP‘Ù$393cd9d2-dd97-496e-b260-ec6e8b1c13b5¡:§collect¥zeros¦reduce¤size¢=>§Integerªmakelookup‘Ù$834e5810-77ea-4dfd-9f37-9d9dbf6585a4¤Base§findall¥Int64¡-©enumerate¡+¡*Ù$de50f95f-984e-4387-958c-64e0265f5953„´precedence_heuristic §cell_idÙ$de50f95f-984e-4387-958c-64e0265f5953´downstream_cells_map«render_walk‘Ù$e4c6456c-867d-4ade-a3c8-310c1e065f14²upstream_cells_mapÞ¡:©Iterators¥error·HypertextLiteral.Bypass¡>¦isless¢|>®Iterators.take¸HypertextLiteral.content¤ceil©mapreduce¤@htl§collect¡<¥Int64Ù HypertextLiteral.attribute_value¤HTML¡/°HypertextLiteral‘Ù$639840dc-976a-4e5c-987f-a92afb2d99d8·HypertextLiteral.Result¹HypertextLiteral.StyleTagÙ$c8500b89-644d-407f-881a-bcbd7da23502„´precedence_heuristic §cell_idÙ$c8500b89-644d-407f-881a-bcbd7da23502´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$84d81413-6334-4965-8632-8a763cd3f28a„´precedence_heuristic §cell_idÙ$84d81413-6334-4965-8632-8a763cd3f28a´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$33d69db9-fa2b-40a3-bbed-21d5fd60f302„´precedence_heuristic §cell_idÙ$33d69db9-fa2b-40a3-bbed-21d5fd60f302´downstream_cells_map«example_6_8‘Ù$e4e80015-40ce-4f8a-aac7-4a9584da4baa²upstream_cells_mapÞ#¥sarsa‘Ù$61bbf9db-49a0-4709-83f4-44f228be09c0µdouble_expected_sarsa‘Ù$3756a3f8-18e8-4d62-afa1-cfeb4183820c·begin_value_iteration_v“Ù$e947f86e-8dc3-4ce7-a9d4-0a7b675a9fa9Ù$4019c974-dcaa-46c8-ac90-e6566a376ea1Ù$3134e913-1e86-495d-a558-c3ec4828bf7b¤@htl¨gridsize‘Ù$0c0b875e-69f8-46ed-ad06-df9c36088fbe«deserializeªq_learning‘Ù$2034fd1e-5171-4eda-85d5-2de62d7a1e8b©eachindex§scatter¡/±double_q_learning‘Ù$d526a3a4-63cc-4f94-8f55-98c9a4a9d134·HypertextLiteral.Result¤last¤fill»show_gridworld_policy_value‘Ù$c34678f6-53bb-4f2a-96f0-a7b16f894ddd®expected_sarsa‘Ù$292d9018-b550-4278-a8e0-78dd6a6853f1²noisy_gridworld_dp‘Ù$297f1606-4ec2-4075-9f81-926dc517b76f¡:¯noisy_gridworld‘Ù$98bec66e-d8f3-4d4d-b4ec-5838489164e5·HypertextLiteral.Bypass¥first¢|>¸HypertextLiteral.content©mapreduce¦isfile©enumerate¤HTML¤plot¦foldxt°HypertextLiteral‘Ù$639840dc-976a-4e5c-987f-a92afb2d99d8¡+¡*£Map¦Layout©serializeÙ$3f3ebc9b-b070-4d73-8be9-823b399c664c„´precedence_heuristic §cell_idÙ$3f3ebc9b-b070-4d73-8be9-823b399c664c´downstream_cells_map¯batch_value_est‘Ù$1e3d231a-4065-48ce-a74e-018066fb232a²upstream_cells_mapÞ¤zero¥Tuple¡>¦isless¦Vector¦length¡<«BatchMethod‘Ù$620a6426-cb29-4010-997b-aa4f9d5f8fb0¦Matrixupdate_value!’Ù$209881b3-3ac8-490e-97bd-fa5ae24a39f5Ù$72b4d8d5-464c-4561-8c69-28ef3f59630b£TD0‘Ù$620a6426-cb29-4010-997b-aa4f9d5f8fb0¦MDP_TD‘Ù$3e767962-7339-4f35-a039-b5521a098ed5ªrunepisode’Ù$bfe71b40-3157-47df-8494-67f8eb8e4e93Ù$7035c082-6e50-4df5-919f-5f09d2011b4aAbstractFloat¡:£max¥zeros§Integer§typemax£end§findall¥push!¶initialize_state_value‘Ù$401831c3-3925-465c-a093-28686f0dad2e¡-©enumerate¡+¬check_policy‘Ù$24a441c8-7aaf-4642-b245-5e1201456d67Ù$d5b612d8-82a1-4586-b721-1baaea2101cf„´precedence_heuristic §cell_idÙ$d5b612d8-82a1-4586-b721-1baaea2101cf´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$dee6b500-0ba1-4bbc-b217-cbb9ad47ad06„´precedence_heuristic §cell_idÙ$dee6b500-0ba1-4bbc-b217-cbb9ad47ad06´downstream_cells_map€²upstream_cells_map†³display_king_policy‘Ù$9651f823-e1cd-4e6e-9ce0-be9ea1c3f0a4¬king_actions‘Ù$031e1106-7408-4c7e-b78e-b713c19123d1´make_windy_gridworld‘Ù$ec285c96-4a75-4af6-8898-ec3176fa34c6«example_6_5‘Ù$d299d800-a64e-4ba2-9603-efa833343405¯action3_display‘Ù$d259ecca-0249-4b28-a4d7-6880d4d84495¤Stay‘Ù$e9359ca3-4d11-4365-bc6e-7babc6fcc7deÙ$897fde24-9a4a-465e-96f2-dd9e8baab294„´precedence_heuristic §cell_idÙ$897fde24-9a4a-465e-96f2-dd9e8baab294´downstream_cells_map€²upstream_cells_mapƒªq_learning‘Ù$2034fd1e-5171-4eda-85d5-2de62d7a1e8b¯windy_gridworld‘Ù$ab331778-f892-4690-8bb3-26464e3fc05f»show_gridworld_policy_value‘Ù$c34678f6-53bb-4f2a-96f0-a7b16f894dddÙ$1e3d231a-4065-48ce-a74e-018066fb232a„´precedence_heuristic §cell_idÙ$1e3d231a-4065-48ce-a74e-018066fb232a´downstream_cells_map«example_6_3‘Ù$22c2213e-5b9b-410f-a0ef-8f1e3db3c532²upstream_cells_mapÞ§@md_str¡:¨make_mrp‘Ù$4ddcd409-c31c-444c-8fcf-7cc45b68d93b¤sqrt§collect¢MC‘Ù$620a6426-cb29-4010-997b-aa4f9d5f8fb0²make_random_policy‘Ù$8e34202a-f841-4464-9017-cd50194f7987¡-§scatter¤plot¡/¡^¡+£TD0‘Ù$620a6426-cb29-4010-997b-aa4f9d5f8fb0¦Layout¯batch_value_est‘Ù$3f3ebc9b-b070-4d73-8be9-823b399c664c¤mean¨getindexÙ$0f22e85f-ed31-49df-a7c7-0579298f05fe„´precedence_heuristic §cell_idÙ$0f22e85f-ed31-49df-a7c7-0579298f05fe´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$9017093c-a9c3-40ea-a9c6-881ee62fc379„´precedence_heuristic §cell_idÙ$9017093c-a9c3-40ea-a9c6-881ee62fc379´downstream_cells_map€²upstream_cells_mapƒ§@md_str«example_6_1‘Ù$bc8bad61-a49a-47d6-8fa6-7dcf6c221910¨getindexÙ$4b0d96d0-25d1-4fed-b105-c65fa2883a61„´precedence_heuristic §cell_idÙ$4b0d96d0-25d1-4fed-b105-c65fa2883a61´downstream_cells_map§mrp_6_2’Ù$12c5efe4-d64d-4b82-877c-29b0e537fee6Ù$64fe8336-d1c2-41fe-a522-1b6f63260fc9²upstream_cells_map‚¨make_mrp‘Ù$4ddcd409-c31c-444c-8fcf-7cc45b68d93b§nstates‘Ù$5455fc97-55cb-4b0e-a3be-9433ccc96fc0Ù$1115f3ec-f4b2-4fba-bd5e-321a63b10a6d„´precedence_heuristic §cell_idÙ$1115f3ec-f4b2-4fba-bd5e-321a63b10a6d´downstream_cells_map€²upstream_cells_map…³king_action_display‘Ù$cdedd35e-52b8-40a5-938d-2d36f6f93217®king_gridworld‘Ù$dda222ef-8178-40bb-bf20-d242924c4fabªq_learning‘Ù$2034fd1e-5171-4eda-85d5-2de62d7a1e8b³display_king_policy‘Ù$9651f823-e1cd-4e6e-9ce0-be9ea1c3f0a4»show_gridworld_policy_value‘Ù$c34678f6-53bb-4f2a-96f0-a7b16f894dddÙ$1e3b3234-3fe1-46c9-82b7-f729c656eb25„´precedence_heuristic §cell_idÙ$1e3b3234-3fe1-46c9-82b7-f729c656eb25´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$6029990b-eb31-45ae-a869-b789fba673a6„´precedence_heuristic §cell_idÙ$6029990b-eb31-45ae-a869-b789fba673a6´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$61bbf9db-49a0-4709-83f4-44f228be09c0„´precedence_heuristic §cell_idÙ$61bbf9db-49a0-4709-83f4-44f228be09c0´downstream_cells_map¥sarsa–Ù$d299d800-a64e-4ba2-9603-efa833343405Ù$6bffb08c-704a-4b7c-bfce-b3d099cf35c0Ù$84584793-8274-4aa1-854f-b167c7434548Ù$6d9ae541-cf8c-4687-9f0a-f008944657e3Ù$69eedbfd-396f-4461-b7a1-c36abc094581Ù$33d69db9-fa2b-40a3-bbed-21d5fd60f302²upstream_cells_mapÞ¤zero¡!£one¦Vector¦length¤copy¡/½initialize_state_action_value‘Ù$c5d32889-634b-4b00-8ba7-0d1ecaf94f05¢==¦MDP_TD‘Ù$3e767962-7339-4f35-a039-b5521a098ed5©init_step‘Ù$3ed12c33-ab0a-49b1-b9e7-c4305ba35767AbstractFloat¡:¥first¥zerosªsarsa_step‘Ù$12aac612-758b-4655-8ede-daddd4af6d3e§findall¥Int64¡-¡+¥undef¡*¶make_Ïµ_greedy_policy!‘Ù$6b496582-cc0e-4195-87ef-94792b0fff54·create_Ïµ_greedy_policy‘Ù$4d4577b5-3753-450d-a247-ebd8c3e8f799Ù$814d89be-cfdf-11ec-3295-49a8f302bbcf„´precedence_heuristic §cell_idÙ$814d89be-cfdf-11ec-3295-49a8f302bbcf´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$52aebb7b-c2a9-443f-bc03-24cd25793b32„´precedence_heuristic §cell_idÙ$52aebb7b-c2a9-443f-bc03-24cd25793b32´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$3d8b1ccd-9bb3-42f2-a77a-6afdb72c1ff8„´precedence_heuristic §cell_idÙ$3d8b1ccd-9bb3-42f2-a77a-6afdb72c1ff8´downstream_cells_mapªcalc_error’Ù$209881b3-3ac8-490e-97bd-fa5ae24a39f5Ù$72b4d8d5-464c-4561-8c69-28ef3f59630b²upstream_cells_map‰£eps¢<=¤zero§typemax¡-¡/£one£absAbstractFloatÙ$031e1106-7408-4c7e-b78e-b713c19123d1„´precedence_heuristic §cell_idÙ$031e1106-7408-4c7e-b78e-b713c19123d1´downstream_cells_map‡¬king_actions—Ù$2155adfa-7a93-4960-950e-1b123da9eea4Ù$dda222ef-8178-40bb-bf20-d242924c4fabÙ$dee6b500-0ba1-4bbc-b217-cbb9ad47ad06Ù$4ddc7d99-0b79-4689-bd93-8798b105c0a2Ù$2f4e2da2-b1a1-41b1-8904-39b59f426da4Ù$0e488135-49e5-4e71-83b1-05d8e61f0510Ù$8e15f4b5-0dc7-47a5-9477-9f4d8807b331§UpRight‘Ù$031e1106-7408-4c7e-b78e-b713c19123d1¨DownLeft‘Ù$031e1106-7408-4c7e-b78e-b713c19123d1¤move’Ù$ec285c96-4a75-4af6-8898-ec3176fa34c6Ù$6556dafb-04fa-434c-868a-8d7bb7b5b196°diagonal_actions¦UpLeft‘Ù$031e1106-7408-4c7e-b78e-b713c19123d1©DownRight‘Ù$031e1106-7408-4c7e-b78e-b713c19123d1²upstream_cells_mapˆ¯GridworldAction‘Ù$e19db54c-4b3c-42d1-b016-9620daf89bfb¡-§UpRight‘Ù$031e1106-7408-4c7e-b78e-b713c19123d1¨DownLeft‘Ù$031e1106-7408-4c7e-b78e-b713c19123d1¡+©DownRight‘Ù$031e1106-7408-4c7e-b78e-b713c19123d1¦UpLeft‘Ù$031e1106-7408-4c7e-b78e-b713c19123d1¬rook_actions‘Ù$e19db54c-4b3c-42d1-b016-9620daf89bfbÙ$7035c082-6e50-4df5-919f-5f09d2011b4a„´precedence_heuristic §cell_idÙ$7035c082-6e50-4df5-919f-5f09d2011b4a´downstream_cells_mapªrunepisode•Ù$415ea466-2038-48fe-9d24-39a90182f1ebÙ$12c5efe4-d64d-4b82-877c-29b0e537fee6Ù$3f3ebc9b-b070-4d73-8be9-823b399c664cÙ$9f28772c-9afe-4253-ab3b-055b0f48be6eÙ$d299d800-a64e-4ba2-9603-efa833343405²upstream_cells_map‚²make_random_policy‘Ù$8e34202a-f841-4464-9017-cd50194f7987¦MDP_TD‘Ù$3e767962-7339-4f35-a039-b5521a098ed5Ù$bfe71b40-3157-47df-8494-67f8eb8e4e93„´precedence_heuristic §cell_idÙ$bfe71b40-3157-47df-8494-67f8eb8e4e93´downstream_cells_mapªrunepisode•Ù$415ea466-2038-48fe-9d24-39a90182f1ebÙ$12c5efe4-d64d-4b82-877c-29b0e537fee6Ù$3f3ebc9b-b070-4d73-8be9-823b399c664cÙ$9f28772c-9afe-4253-ab3b-055b0f48be6eÙ$d299d800-a64e-4ba2-9603-efa833343405²upstream_cells_mapŠ¤Real¢<=¨takestep‘Ù$d5abd922-a8c2-4f5c-9a6e-d2490a8ad7dc¥push!¡!¦Matrix£Inf¡+¦MDP_TD‘Ù$3e767962-7339-4f35-a039-b5521a098ed5¦VectorÙ$b35264b0-ac5b-40ce-95e4-9b2bc4cb106f„´precedence_heuristic §cell_idÙ$b35264b0-ac5b-40ce-95e4-9b2bc4cb106f´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$d259ecca-0249-4b28-a4d7-6880d4d84495„´precedence_heuristic §cell_idÙ$d259ecca-0249-4b28-a4d7-6880d4d84495´downstream_cells_map¯action3_display‘Ù$dee6b500-0ba1-4bbc-b217-cbb9ad47ad06²upstream_cells_map„·HypertextLiteral.Bypass·HypertextLiteral.Result°HypertextLiteral‘Ù$639840dc-976a-4e5c-987f-a92afb2d99d8¤@htlÙ$22c4ce8c-bd82-4eb3-8af5-55342018edff„´precedence_heuristic §cell_idÙ$22c4ce8c-bd82-4eb3-8af5-55342018edff´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$6faa3015-3ac4-44af-a78c-10b175822441„´precedence_heuristic §cell_idÙ$6faa3015-3ac4-44af-a78c-10b175822441´downstream_cells_mapªcliffworld“Ù$a4c4d5f2-d76d-425e-b8c9-9047fe53c4f0Ù$667666b9-3ab6-4836-953d-9878208103c9Ù$cafedde8-be94-4697-a511-510a5fea0155²upstream_cells_map¯make_cliffworld‘Ù$6556dafb-04fa-434c-868a-8d7bb7b5b196Ù$fa04d20f-6e3f-46f8-b3f7-a543d1fa360a„´precedence_heuristic §cell_idÙ$fa04d20f-6e3f-46f8-b3f7-a543d1fa360a´downstream_cells_map¶max_bias_visualization‘Ù$ff5d051e-5de1-48a9-9578-5dbafd71afd1²upstream_cells_mapÞ¥randn¡:¤vcat¤hcat§collect¦reduce©mapreduce§eachrow§adjoint£end¨cum_mean‘Ù$bce6e4ab-58ec-4e00-be34-bc4caf51f57d§eachcol©enumerate§scatter§cum_max‘Ù$7d3be915-9092-4261-8435-dd546a7db144¡/¤plot¡+¤fill¤conj¦LayoutÙ$297f1606-4ec2-4075-9f81-926dc517b76f„´precedence_heuristic §cell_idÙ$297f1606-4ec2-4075-9f81-926dc517b76f´downstream_cells_map²noisy_gridworld_dp‘Ù$33d69db9-fa2b-40a3-bbed-21d5fd60f302²upstream_cells_map…¯noisy_gridworld‘Ù$98bec66e-d8f3-4d4d-b4ec-5838489164e5noisy_rewards‘Ù$943b6d7e-14a4-4532-90c7-dd5080be0c6e¥firstºcreate_noisy_gridworld_mdp‘Ù$8ddf6b9d-d76d-401f-96ad-2a0b5c114fa4¤lastÙ$f2776908-d06a-4073-b2ce-ecbf109c9cc7„´precedence_heuristic §cell_idÙ$f2776908-d06a-4073-b2ce-ecbf109c9cc7´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$d83ff60f-8973-4dc1-9358-5ad109ea5490„´precedence_heuristic §cell_idÙ$d83ff60f-8973-4dc1-9358-5ad109ea5490´downstream_cells_map«ex_6_8_load‘Ù$e4e80015-40ce-4f8a-aac7-4a9584da4baa²upstream_cells_mapŠ¤Core§@md_str¤Base·PlutoRunner.create_bond«PlutoRunner¨CheckBox¯Core.applicable¥@bind¨Base.get¨getindexÙ$105c5c23-270d-437e-89dd-12297814c6e0„´precedence_heuristic §cell_idÙ$105c5c23-270d-437e-89dd-12297814c6e0´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$e8f94345-9ad5-48d4-8709-d796fb55db3f„´precedence_heuristic §cell_idÙ$e8f94345-9ad5-48d4-8709-d796fb55db3f´downstream_cells_map€²upstream_cells_map¬exercise_6_5‘Ù$ddf3bb61-16c9-48c4-95d4-263260309762Ù$64b210e8-223f-41f7-a6b7-8af6183ddf87„´precedence_heuristic §cell_idÙ$64b210e8-223f-41f7-a6b7-8af6183ddf87´downstream_cells_map´make_noisy_gridworld‘Ù$98bec66e-d8f3-4d4d-b4ec-5838489164e5²upstream_cells_map†´make_windy_gridworld‘Ù$ec285c96-4a75-4af6-8898-ec3176fa34c6noisy_rewards‘Ù$943b6d7e-14a4-4532-90c7-dd5080be0c6e®GridworldState‘Ù$e19db54c-4b3c-42d1-b016-9620daf89bfb¤rand¤fill¬rook_actions‘Ù$e19db54c-4b3c-42d1-b016-9620daf89bfbÙ$2f4e2da2-b1a1-41b1-8904-39b59f426da4„´precedence_heuristic §cell_idÙ$2f4e2da2-b1a1-41b1-8904-39b59f426da4´downstream_cells_mapµking_gridworld_mdp_dp²upstream_cells_map„¬king_actions‘Ù$031e1106-7408-4c7e-b78e-b713c19123d1©wind_vals‘Ù$e19db54c-4b3c-42d1-b016-9620daf89bfb®GridworldState‘Ù$e19db54c-4b3c-42d1-b016-9620daf89bfb´create_gridworld_mdp’Ù$95245673-2c29-401e-bb4b-a39dc8172297Ù$7ac99619-5232-4db8-8553-d79ea5415d29Ù$bc8bad61-a49a-47d6-8fa6-7dcf6c221910„´precedence_heuristic §cell_idÙ$bc8bad61-a49a-47d6-8fa6-7dcf6c221910´downstream_cells_map«example_6_1’Ù$6edb550d-5c9f-4ea6-8746-6632806df11eÙ$9017093c-a9c3-40ea-a9c6-881ee62fc379²upstream_cells_map‹¡:£zip£end©eachindex§scatter¤plot¡+¤attr¤last¤fill¦LayoutÙ$2455742f-dc18-4d6b-9f58-5666adac6919„´precedence_heuristic §cell_idÙ$2455742f-dc18-4d6b-9f58-5666adac6919´downstream_cells_mapµcreate_car_rental_mdp‘Ù$c2f56287-9a3e-454a-9ec1-53184b788db9²upstream_cells_mapÞ¤Dict¤zero£sum¡>§poisson‘Ù$b3d4117f-7db4-43a6-8427-c08f3542d71f¦isless£zip¤Real¦length¡<©eachindex£min©intersect§setdiff¢==£abs«@NamedTuple¡:§collect¥zeros¦reduce¤size¢=>§Integer¤Base§findall¡-©FiniteMDP‘Ù$d7566d1b-8938-4e2c-8c54-124f790e72ae©enumerate¡+¡*Ù$f474fcbd-e3c3-49fd-a6b7-6d6a8a7dda09„´precedence_heuristic §cell_idÙ$f474fcbd-e3c3-49fd-a6b7-6d6a8a7dda09´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$69eedbfd-396f-4461-b7a1-c36abc094581„´precedence_heuristic §cell_idÙ$69eedbfd-396f-4461-b7a1-c36abc094581´downstream_cells_map¯example_6_7_mdp‘Ù$00d67a93-437c-4cda-899a-9daa1102e1f2²upstream_cells_mapÞ¥randn¤Term‘Ù$4382928c-6325-4ecd-b7cf-282525a270ab¥sarsa‘Ù$61bbf9db-49a0-4709-83f4-44f228be09c0µdouble_expected_sarsa‘Ù$3756a3f8-18e8-4d62-afa1-cfeb4183820c«deserializeªq_learning‘Ù$2034fd1e-5171-4eda-85d5-2de62d7a1e8b¡B‘Ù$4382928c-6325-4ecd-b7cf-282525a270ab§scatter§Float32¡/±double_q_learning‘Ù$d526a3a4-63cc-4f94-8f55-98c9a4a9d134¤last¤fill¢==¤mean¦MDP_TD‘Ù$3e767962-7339-4f35-a039-b5521a098ed5®expected_sarsa‘Ù$292d9018-b550-4278-a8e0-78dd6a6853f1¡A‘Ù$4382928c-6325-4ecd-b7cf-282525a270ab¡:§collect¥zeros¦isfile§Integer¡-¤plot¶make_Ïµ_greedy_policy!‘Ù$6b496582-cc0e-4195-87ef-94792b0fff54¦Layout·create_Ïµ_greedy_policy‘Ù$4d4577b5-3753-450d-a247-ebd8c3e8f799©serializeÙ$7ac99619-5232-4db8-8553-d79ea5415d29„´precedence_heuristic §cell_idÙ$7ac99619-5232-4db8-8553-d79ea5415d29´downstream_cells_map´create_gridworld_mdp”Ù$d299d800-a64e-4ba2-9603-efa833343405Ù$71774d5f-7841-403f-bc6b-1a0cbbb72d6dÙ$2f4e2da2-b1a1-41b1-8904-39b59f426da4Ù$0e488135-49e5-4e71-83b1-05d8e61f0510²upstream_cells_map†¦length¡:©FiniteMDP‘Ù$d7566d1b-8938-4e2c-8c54-124f790e72ae§Float32¥zeros¦MDP_TD‘Ù$3e767962-7339-4f35-a039-b5521a098ed5Ù$0163763b-a15f-447e-b3d2-32d4bf9d2605„´precedence_heuristic §cell_idÙ$0163763b-a15f-447e-b3d2-32d4bf9d2605´downstream_cells_map²max_visual_params2‘Ù$2651af2d-56a8-4f7e-a56a-45cabd665c72²upstream_cells_map§@md_str¤Core¡:§PlutoUI‘Ù$639840dc-976a-4e5c-987f-a92afb2d99d8¢|>¨Base.get¥@bind¤Base«PlutoRunner·PlutoRunner.create_bond«NumberField§confirm¯Core.applicable¯PlutoUI.combine¨getindexÙ$53145cc2-784c-468b-8e91-9bb7866db218„´precedence_heuristic §cell_idÙ$53145cc2-784c-468b-8e91-9bb7866db218´downstream_cells_map¡t’Ù$54d97122-2d01-46ec-aafe-00bfc9f2d6d1Ù$1dd1ba55-548a-41f6-903e-70742fd60e3d²upstream_cells_map¤Core§PlutoUI‘Ù$639840dc-976a-4e5c-987f-a92afb2d99d8¥delay‘Ù$5455fc97-55cb-4b0e-a3be-9433ccc96fc0¨Base.get¥@bindPlutoUI.Clock¦length¤Base«PlutoRunner·PlutoRunner.create_bond®mrp_trajectory‘Ù$12c5efe4-d64d-4b82-877c-29b0e537fee6¯Core.applicable¡+Ù$6b496582-cc0e-4195-87ef-94792b0fff54„´precedence_heuristic §cell_idÙ$6b496582-cc0e-4195-87ef-94792b0fff54´downstream_cells_map¶make_Ïµ_greedy_policy!—Ù$4d4577b5-3753-450d-a247-ebd8c3e8f799Ù$61bbf9db-49a0-4709-83f4-44f228be09c0Ù$2034fd1e-5171-4eda-85d5-2de62d7a1e8bÙ$292d9018-b550-4278-a8e0-78dd6a6853f1Ù$3756a3f8-18e8-4d62-afa1-cfeb4183820cÙ$d526a3a4-63cc-4f94-8f55-98c9a4a9d134Ù$69eedbfd-396f-4461-b7a1-c36abc094581²upstream_cells_map£sum¨isapprox®AbstractVector£one¤Real¦length©eachindex¡-¡/¡+¡*¢==§maximumÙ$9db7a268-1e6d-4366-a0ec-ebf54916d3b0„´precedence_heuristic §cell_idÙ$9db7a268-1e6d-4366-a0ec-ebf54916d3b0´downstream_cells_map€²upstream_cells_map‚«example_6_2‘Ù$2786101e-d365-4d6a-8de7-b9794499efb4§nstates‘Ù$5455fc97-55cb-4b0e-a3be-9433ccc96fc0Ù$c2f56287-9a3e-454a-9ec1-53184b788db9„´precedence_heuristic §cell_idÙ$c2f56287-9a3e-454a-9ec1-53184b788db9´downstream_cells_mapjacks_car_mdp‘Ù$bb085f2e-83cb-45b2-adf6-c07da892d6e1²upstream_cells_mapµcreate_car_rental_mdp‘Ù$2455742f-dc18-4d6b-9f58-5666adac6919Ù$18e60b1d-97ec-432c-a388-003e7fae415f„´precedence_heuristic §cell_idÙ$18e60b1d-97ec-432c-a388-003e7fae415f´downstream_cells_map¶bellman_optimal_value!‘Ù$8787a5fd-d0ab-46b5-a7df-e7bc103a7378²upstream_cells_mapÞ¤zero©@fastmath¶Base.FastMath.sub_fast¦isless©@inbounds§nothing¦length¡<¶Base.FastMath.max_fast¯Base.simd_index©eachindex¦Vector¥@simd¶Base.FastMath.abs_fast¤Real³FiniteAfterstateMDP‘Ù$393cd9d2-dd97-496e-b260-ec6e8b1c13b5¡:¥zeros®julia.simdloop¶Base.FastMath.div_fast£eps¤BaseµBase.simd_outer_rangeºBase.FastMath.maximum_fast©enumerate¶Base.simd_inner_length¶Base.FastMath.add_fast¡+¶Base.FastMath.mul_fastÙ$12c5efe4-d64d-4b82-877c-29b0e537fee6„´precedence_heuristic §cell_idÙ$12c5efe4-d64d-4b82-877c-29b0e537fee6´downstream_cells_map®mrp_trajectory“Ù$53145cc2-784c-468b-8e91-9bb7866db218Ù$54d97122-2d01-46ec-aafe-00bfc9f2d6d1Ù$1dd1ba55-548a-41f6-903e-70742fd60e3d²upstream_cells_map„§mrp_6_2‘Ù$4b0d96d0-25d1-4fed-b105-c65fa2883a61©start_mrp‘Ù$5455fc97-55cb-4b0e-a3be-9433ccc96fc0¦Ï€_mrp‘Ù$64fe8336-d1c2-41fe-a522-1b6f63260fc9ªrunepisode’Ù$bfe71b40-3157-47df-8494-67f8eb8e4e93Ù$7035c082-6e50-4df5-919f-5f09d2011b4aÙ$a72d07bf-e337-4bd4-af5c-44d74d163b6b„´precedence_heuristic §cell_idÙ$a72d07bf-e337-4bd4-af5c-44d74d163b6b´downstream_cells_map€²upstream_cells_map¬exercise_6_5‘Ù$ddf3bb61-16c9-48c4-95d4-263260309762Ù$0201ae9f-4a31-497e-86ab-62b454ca85de„´precedence_heuristic §cell_idÙ$0201ae9f-4a31-497e-86ab-62b454ca85de´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$b37f2395-1480-4c7c-b6c0-eba391e969d7„´precedence_heuristic §cell_idÙ$b37f2395-1480-4c7c-b6c0-eba391e969d7´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$6edb550d-5c9f-4ea6-8746-6632806df11e„´precedence_heuristic §cell_idÙ$6edb550d-5c9f-4ea6-8746-6632806df11e´downstream_cells_map€²upstream_cells_map«example_6_1‘Ù$bc8bad61-a49a-47d6-8fa6-7dcf6c221910Ù$01582b3b-c4d0-4691-9edf-f77e6d8be2c9„´precedence_heuristic §cell_idÙ$01582b3b-c4d0-4691-9edf-f77e6d8be2c9´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$7ed07ddc-1c63-4ce7-bfd3-6da54304d297„´precedence_heuristic §cell_idÙ$7ed07ddc-1c63-4ce7-bfd3-6da54304d297´downstream_cells_map³makepolicyvaluemaps‘Ù$30e663da-282c-42ff-8171-dbe3c5c467c6²upstream_cells_map¡:£sum¥zeros¤view¤size§findmax¦Vector¤Real¥Int64«CompleteMDP‘Ù$d7566d1b-8938-4e2c-8c54-124f790e72ae¦Matrix¡+¢==Ù$4862942b-d1e2-4ac8-8e88-65205e91a070„´precedence_heuristic §cell_idÙ$4862942b-d1e2-4ac8-8e88-65205e91a070´downstream_cells_map±max_visual_params‘Ù$ff5d051e-5de1-48a9-9578-5dbafd71afd1²upstream_cells_map§@md_str¤Core¡:§PlutoUI‘Ù$639840dc-976a-4e5c-987f-a92afb2d99d8¢|>¨Base.get¥@bind¤Base«PlutoRunner·PlutoRunner.create_bond«NumberField§confirm¯Core.applicable¯PlutoUI.combine¨getindexÙ$a5009785-64b4-489b-a967-f7840b4a9463„´precedence_heuristic §cell_idÙ$a5009785-64b4-489b-a967-f7840b4a9463´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$eb735ead-978b-409c-8990-b5fa7a027ebf„´precedence_heuristic §cell_idÙ$eb735ead-978b-409c-8990-b5fa7a027ebf´downstream_cells_map²tabular_TD0_pred_V’Ù$2786101e-d365-4d6a-8de7-b9794499efb4Ù$ddf3bb61-16c9-48c4-95d4-263260309762²upstream_cells_mapÞ¤zero¡:¡!¥zeros§Integer¦Vector¦length§findall¨takestep‘Ù$d5abd922-a8c2-4f5c-9a6e-d2490a8ad7dc¡-©enumerate¶initialize_state_value‘Ù$401831c3-3925-465c-a093-28686f0dad2e¦Matrix¡+¬check_policy‘Ù$24a441c8-7aaf-4642-b245-5e1201456d67¡*¦MDP_TD‘Ù$3e767962-7339-4f35-a039-b5521a098ed5AbstractFloatÙ$2034fd1e-5171-4eda-85d5-2de62d7a1e8b„´precedence_heuristic §cell_idÙ$2034fd1e-5171-4eda-85d5-2de62d7a1e8b´downstream_cells_mapªq_learning™Ù$897fde24-9a4a-465e-96f2-dd9e8baab294Ù$1115f3ec-f4b2-4fba-bd5e-321a63b10a6dÙ$1e45a661-c2e1-40c2-b27b-5f80f95efdabÙ$6bffb08c-704a-4b7c-bfce-b3d099cf35c0Ù$84584793-8274-4aa1-854f-b167c7434548Ù$6d9ae541-cf8c-4687-9f0a-f008944657e3Ù$69eedbfd-396f-4461-b7a1-c36abc094581Ù$b5e06f59-33b5-414e-9a81-43e8abd07aa3Ù$33d69db9-fa2b-40a3-bbed-21d5fd60f302²upstream_cells_mapÞ¤zero¡!£one¦Vector¦length¤copy©eachindex¡/½initialize_state_action_value‘Ù$c5d32889-634b-4b00-8ba7-0d1ecaf94f05¢==¦MDP_TD‘Ù$3e767962-7339-4f35-a039-b5521a098ed5AbstractFloat¡:¥first¥zeros§findall¥Int64¨takestep‘Ù$d5abd922-a8c2-4f5c-9a6e-d2490a8ad7dc¡-¡+¥undef¡*¶make_Ïµ_greedy_policy!‘Ù$6b496582-cc0e-4195-87ef-94792b0fff54§maximum·create_Ïµ_greedy_policy‘Ù$4d4577b5-3753-450d-a247-ebd8c3e8f799Ù$4382928c-6325-4ecd-b7cf-282525a270ab„´precedence_heuristic §cell_idÙ$4382928c-6325-4ecd-b7cf-282525a270ab´downstream_cells_map„¡B‘Ù$69eedbfd-396f-4461-b7a1-c36abc094581¡A‘Ù$69eedbfd-396f-4461-b7a1-c36abc094581¤Term‘Ù$69eedbfd-396f-4461-b7a1-c36abc094581MaxBiasStates‘Ù$4382928c-6325-4ecd-b7cf-282525a270ab²upstream_cells_mapMaxBiasStates‘Ù$4382928c-6325-4ecd-b7cf-282525a270abÙ$8bc54c94-9c92-4904-b3a6-13ff3f0110bb„´precedence_heuristic §cell_idÙ$8bc54c94-9c92-4904-b3a6-13ff3f0110bb´downstream_cells_map¯show_grid_value’Ù$d299d800-a64e-4ba2-9603-efa833343405Ù$c34678f6-53bb-4f2a-96f0-a7b16f894ddd²upstream_cells_mapÞ³king_action_display‘Ù$cdedd35e-52b8-40a5-938d-2d36f6f93217©findfirst¡:·HypertextLiteral.Bypass¸HypertextLiteral.content©mapreduce¤@htl¦Vector©eachindex¡-¤HTMLÙ HypertextLiteral.attribute_value·HypertextLiteral.Result°HypertextLiteral‘Ù$639840dc-976a-4e5c-987f-a92afb2d99d8¡+¹HypertextLiteral.StyleTag¦Matrix¡*¥round§maximumÙ$4b1a4c14-3c2b-40c0-995c-cd0334ed8b3a„´precedence_heuristic §cell_idÙ$4b1a4c14-3c2b-40c0-995c-cd0334ed8b3a´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$f0f9d3d5-e76a-4472-bfb1-da29d73a7916„´precedence_heuristic §cell_idÙ$f0f9d3d5-e76a-4472-bfb1-da29d73a7916´downstream_cells_map€²upstream_cells_map„³king_action_display‘Ù$cdedd35e-52b8-40a5-938d-2d36f6f93217®king_gridworld‘Ù$dda222ef-8178-40bb-bf20-d242924c4fab³display_king_policy‘Ù$9651f823-e1cd-4e6e-9ce0-be9ea1c3f0a4«example_6_5‘Ù$d299d800-a64e-4ba2-9603-efa833343405Ù$4c1b286c-2ba9-4293-81e1-bf360baa75fa„´precedence_heuristic §cell_idÙ$4c1b286c-2ba9-4293-81e1-bf360baa75fa´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$3134e913-1e86-495d-a558-c3ec4828bf7b„´precedence_heuristic §cell_idÙ$3134e913-1e86-495d-a558-c3ec4828bf7b´downstream_cells_map·begin_value_iteration_v”Ù$d299d800-a64e-4ba2-9603-efa833343405Ù$33d69db9-fa2b-40a3-bbed-21d5fd60f302Ù$bb085f2e-83cb-45b2-adf6-c07da892d6e1Ù$1f28280e-ba3b-4ca5-89e4-6ca4a90f5893²upstream_cells_map†¤zero©FiniteMDP‘Ù$d7566d1b-8938-4e2c-8c54-124f790e72ae¤ones¡*¤size¤RealÙ$db31579e-3e56-4271-8fc3-eb13bc95ac27„´precedence_heuristic §cell_idÙ$db31579e-3e56-4271-8fc3-eb13bc95ac27´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$943b6d7e-14a4-4532-90c7-dd5080be0c6e„´precedence_heuristic §cell_idÙ$943b6d7e-14a4-4532-90c7-dd5080be0c6e´downstream_cells_mapnoisy_rewards’Ù$64b210e8-223f-41f7-a6b7-8af6183ddf87Ù$297f1606-4ec2-4075-9f81-926dc517b76f²upstream_cells_map€Ù$84584793-8274-4aa1-854f-b167c7434548„´precedence_heuristic §cell_idÙ$84584793-8274-4aa1-854f-b167c7434548´downstream_cells_mapÙ,gridworld_Q_vs_sarsa_vs_expected_sarsa_solve‘Ù$667666b9-3ab6-4836-953d-9878208103c9²upstream_cells_mapÞ¥sarsa‘Ù$61bbf9db-49a0-4709-83f4-44f228be09c0¥Tuple£zip¤@htlªq_learning‘Ù$2034fd1e-5171-4eda-85d5-2de62d7a1e8b©eachindex§scatter¡/·HypertextLiteral.Result¤fill®expected_sarsa‘Ù$292d9018-b550-4278-a8e0-78dd6a6853f1¡:·HypertextLiteral.Bypass©plot_path’Ù$9f28772c-9afe-4253-ab3b-055b0f48be6eÙ$bd1029f9-d6a8-4c68-98cd-8af94297b521¸HypertextLiteral.content©mapreduce´create_greedy_policy‘Ù$84a71bf8-0d66-42cd-ac7b-589d63a16eda¤plot°HypertextLiteral‘Ù$639840dc-976a-4e5c-987f-a92afb2d99d8¡+¤attr¦LayoutÙ$9f28772c-9afe-4253-ab3b-055b0f48be6e„´precedence_heuristic §cell_idÙ$9f28772c-9afe-4253-ab3b-055b0f48be6e´downstream_cells_map©plot_path•Ù$75bfe913-8757-4789-b708-7d400c225218Ù$d299d800-a64e-4ba2-9603-efa833343405Ù$c34678f6-53bb-4f2a-96f0-a7b16f894dddÙ$6bffb08c-704a-4b7c-bfce-b3d099cf35c0Ù$84584793-8274-4aa1-854f-b167c7434548²upstream_cells_mapÞ¡*©findfirst¡:£max¦isless£end¦length¡-§scatter¤plot¡+¤attr¤last¤fill¦Layout§maximumªrunepisode’Ù$bfe71b40-3157-47df-8494-67f8eb8e4e93Ù$7035c082-6e50-4df5-919f-5f09d2011b4aÙ$1dd1ba55-548a-41f6-903e-70742fd60e3d„´precedence_heuristic §cell_idÙ$1dd1ba55-548a-41f6-903e-70742fd60e3d´downstream_cells_map€²upstream_cells_mapƒ®mrp_trajectory‘Ù$12c5efe4-d64d-4b82-877c-29b0e537fee6¡t‘Ù$53145cc2-784c-468b-8e91-9bb7866db218®show_mrp_state‘Ù$87fadfc0-2cdb-4be2-81ad-e8fdeffb690cÙ$2a3e4617-efbb-4bbc-9c61-8535628e439c„´precedence_heuristic §cell_idÙ$2a3e4617-efbb-4bbc-9c61-8535628e439c´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$5f32fed0-c921-4cbb-85fe-ade54d4c6c95„´precedence_heuristic §cell_idÙ$5f32fed0-c921-4cbb-85fe-ade54d4c6c95´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$a3d10753-2ec3-4252-9629-834145678b6a„´precedence_heuristic §cell_idÙ$a3d10753-2ec3-4252-9629-834145678b6a´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$12aac612-758b-4655-8ede-daddd4af6d3e„´precedence_heuristic §cell_idÙ$12aac612-758b-4655-8ede-daddd4af6d3e´downstream_cells_mapªsarsa_step‘Ù$61bbf9db-49a0-4709-83f4-44f228be09c0²upstream_cells_map…sample_action‘Ù$7a5ff8f7-70d4-46f1-a4a7-bbfcec4f6e3f¦Matrix¦MDP_TD‘Ù$3e767962-7339-4f35-a039-b5521a098ed5¨Function¤RealÙ$2c49900b-3c57-4d9a-b3dc-ef9cc20c30c1„´precedence_heuristic §cell_idÙ$2c49900b-3c57-4d9a-b3dc-ef9cc20c30c1´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$e26f788e-f602-403e-929e-6c98a6e6bf79„´precedence_heuristic §cell_idÙ$e26f788e-f602-403e-929e-6c98a6e6bf79´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$c09530bc-f37e-4d57-a267-14d4027147da„´precedence_heuristic §cell_idÙ$c09530bc-f37e-4d57-a267-14d4027147da´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$0c0b875e-69f8-46ed-ad06-df9c36088fbe„´precedence_heuristic §cell_idÙ$0c0b875e-69f8-46ed-ad06-df9c36088fbe´downstream_cells_map¨gridsize“Ù$b5e06f59-33b5-414e-9a81-43e8abd07aa3Ù$98bec66e-d8f3-4d4d-b4ec-5838489164e5Ù$33d69db9-fa2b-40a3-bbed-21d5fd60f302²upstream_cells_map€Ù$8d05403a-adeb-40ac-a98a-87586d5a5170„´precedence_heuristic §cell_idÙ$8d05403a-adeb-40ac-a98a-87586d5a5170´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$44c49006-e210-4f97-916e-fe62f36c593f„´precedence_heuristic §cell_idÙ$44c49006-e210-4f97-916e-fe62f36c593f´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$0ad739c9-8aca-4b82-bf20-c73584d29535„´precedence_heuristic §cell_idÙ$0ad739c9-8aca-4b82-bf20-c73584d29535´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$0748902c-ffc0-4634-9a1b-e642b3dfb77b„´precedence_heuristic §cell_idÙ$0748902c-ffc0-4634-9a1b-e642b3dfb77b´downstream_cells_map²form_random_policy‘Ù$4019c974-dcaa-46c8-ac90-e6566a376ea1²upstream_cells_map„¦length¡/«CompleteMDP‘Ù$d7566d1b-8938-4e2c-8c54-124f790e72ae¤onesÙ$6a1503c6-c77b-4e3a-9f07-74b2af1a5ff7„´precedence_heuristic §cell_idÙ$6a1503c6-c77b-4e3a-9f07-74b2af1a5ff7´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$292d9018-b550-4278-a8e0-78dd6a6853f1„´precedence_heuristic §cell_idÙ$292d9018-b550-4278-a8e0-78dd6a6853f1´downstream_cells_map®expected_sarsa”Ù$84584793-8274-4aa1-854f-b167c7434548Ù$6d9ae541-cf8c-4687-9f0a-f008944657e3Ù$69eedbfd-396f-4461-b7a1-c36abc094581Ù$33d69db9-fa2b-40a3-bbed-21d5fd60f302²upstream_cells_mapÞ¤zero£sum¡!£one¦Vector¦length¤copy©eachindex¡/½initialize_state_action_value‘Ù$c5d32889-634b-4b00-8ba7-0d1ecaf94f05¢==¦MDP_TD‘Ù$3e767962-7339-4f35-a039-b5521a098ed5AbstractFloat¡:¥first¥zeros§findall¥Int64¨takestep‘Ù$d5abd922-a8c2-4f5c-9a6e-d2490a8ad7dc¡-¡+¥undef¡*¶make_Ïµ_greedy_policy!‘Ù$6b496582-cc0e-4195-87ef-94792b0fff54·create_Ïµ_greedy_policy‘Ù$4d4577b5-3753-450d-a247-ebd8c3e8f799Ù$07c57f37-22be-4c39-8279-d80addcea0c5„´precedence_heuristic §cell_idÙ$07c57f37-22be-4c39-8279-d80addcea0c5´downstream_cells_map¿create_stochastic_gridworld_mdp‘Ù$8e15f4b5-0dc7-47a5-9477-9f4d8807b331²upstream_cells_mapÞªapply_wind‘Ù$e19db54c-4b3c-42d1-b016-9620daf89bfb¡:´make_windy_gridworld‘Ù$ec285c96-4a75-4af6-8898-ec3176fa34c6£max¥zeros¦isless¦length¡-©FiniteMDP‘Ù$d7566d1b-8938-4e2c-8c54-124f790e72ae§Float32¡/£min¡+©wind_vals‘Ù$e19db54c-4b3c-42d1-b016-9620daf89bfb¢==®GridworldState‘Ù$e19db54c-4b3c-42d1-b016-9620daf89bfbÙ$b5187232-d808-49b6-9f7e-a4cbeb6c2b3e„´precedence_heuristic §cell_idÙ$b5187232-d808-49b6-9f7e-a4cbeb6c2b3e´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$54d97122-2d01-46ec-aafe-00bfc9f2d6d1„´precedence_heuristic §cell_idÙ$54d97122-2d01-46ec-aafe-00bfc9f2d6d1´downstream_cells_map€²upstream_cells_mapˆ§@md_str¦length£min®mrp_trajectory‘Ù$12c5efe4-d64d-4b82-877c-29b0e537fee6¦isless¥first¡t‘Ù$53145cc2-784c-468b-8e91-9bb7866db218¨getindexÙ$926ec37d-b969-4dc9-99b2-a6b29c6d880c„´precedence_heuristic §cell_idÙ$926ec37d-b969-4dc9-99b2-a6b29c6d880c´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$c360945e-f8b2-4c6f-a70c-6ab4ddcf5b54„´precedence_heuristic §cell_idÙ$c360945e-f8b2-4c6f-a70c-6ab4ddcf5b54´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$573a9919-bd7e-4a56-b830-4e40e91288ef„´precedence_heuristic §cell_idÙ$573a9919-bd7e-4a56-b830-4e40e91288ef´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$4556cf44-4a1c-4ca4-bfb8-4841301a2ce6„´precedence_heuristic §cell_idÙ$4556cf44-4a1c-4ca4-bfb8-4841301a2ce6´downstream_cells_map³display_rook_policy’Ù$d299d800-a64e-4ba2-9603-efa833343405Ù$c34678f6-53bb-4f2a-96f0-a7b16f894ddd²upstream_cells_map‡Ù HypertextLiteral.attribute_value·HypertextLiteral.Bypass·HypertextLiteral.Result°HypertextLiteral‘Ù$639840dc-976a-4e5c-987f-a92afb2d99d8¤@htlAbstractFloat¦VectorÙ$bb085f2e-83cb-45b2-adf6-c07da892d6e1„´precedence_heuristic §cell_idÙ$bb085f2e-83cb-45b2-adf6-c07da892d6e1´downstream_cells_mapƒ¥v_car«car_results¦Ï€_car²upstream_cells_map‡§@md_str´makepolicyvalueplots‘Ù$30e663da-282c-42ff-8171-dbe3c5c467c6£end¦length·begin_value_iteration_v“Ù$e947f86e-8dc3-4ce7-a9d4-0a7b675a9fa9Ù$4019c974-dcaa-46c8-ac90-e6566a376ea1Ù$3134e913-1e86-495d-a558-c3ec4828bf7bjacks_car_mdp‘Ù$c2f56287-9a3e-454a-9ec1-53184b788db9¨getindexÙ$e9359ca3-4d11-4365-bc6e-7babc6fcc7de„´precedence_heuristic §cell_idÙ$e9359ca3-4d11-4365-bc6e-7babc6fcc7de´downstream_cells_map‚¤move’Ù$ec285c96-4a75-4af6-8898-ec3176fa34c6Ù$6556dafb-04fa-434c-868a-8d7bb7b5b196¤Stay“Ù$e9359ca3-4d11-4365-bc6e-7babc6fcc7deÙ$dee6b500-0ba1-4bbc-b217-cbb9ad47ad06Ù$0e488135-49e5-4e71-83b1-05d8e61f0510²upstream_cells_map‚¯GridworldAction‘Ù$e19db54c-4b3c-42d1-b016-9620daf89bfb¤Stay‘Ù$e9359ca3-4d11-4365-bc6e-7babc6fcc7deÙ$639840dc-976a-4e5c-987f-a92afb2d99d8„´precedence_heuristic§cell_idÙ$639840dc-976a-4e5c-987f-a92afb2d99d8´downstream_cells_map‹ªStatistics©StatsBase«Transducers§PlutoUI”Ù$53145cc2-784c-468b-8e91-9bb7866db218Ù$187fc682-2282-46ca-b988-c9de438f36fdÙ$4862942b-d1e2-4ac8-8e88-65205e91a070Ù$0163763b-a15f-447e-b3d2-32d4bf9d2605§Threads¨LatexifySerializationLinearAlgebra°HypertextLiteralÜÙ$de50f95f-984e-4387-958c-64e0265f5953Ù$902738c3-2f7b-49cb-8580-29359c857027Ù$2786101e-d365-4d6a-8de7-b9794499efb4Ù$62a9a36a-bedb-4f5a-80a4-2d4111a65c12Ù$4d7619ee-933f-452a-9202-e95a8f3da20fÙ$75bfe913-8757-4789-b708-7d400c225218Ù$500d8dd4-fc53-4021-b797-114224ca4debÙ$4556cf44-4a1c-4ca4-bfb8-4841301a2ce6Ù$d299d800-a64e-4ba2-9603-efa833343405Ù$cdedd35e-52b8-40a5-938d-2d36f6f93217Ù$9651f823-e1cd-4e6e-9ce0-be9ea1c3f0a4Ù$d259ecca-0249-4b28-a4d7-6880d4d84495Ù$8bc54c94-9c92-4904-b3a6-13ff3f0110bbÙ$678cad7a-1abb-4fcc-91ba-b5abcbb914cbÙ$9da5fd84-800d-4b3e-8627-e90ce8f20297Ù$c34678f6-53bb-4f2a-96f0-a7b16f894dddÙ$6bffb08c-704a-4b7c-bfce-b3d099cf35c0Ù$84584793-8274-4aa1-854f-b167c7434548Ù$33d69db9-fa2b-40a3-bbed-21d5fd60f302¬LaTeXStrings‘Ù$30e663da-282c-42ff-8171-dbe3c5c467c6«PlutoPlotly²upstream_cells_map¯TableOfContentsÙ$dd167494-99d6-45c6-99e4-c36fde5e2d3f„´precedence_heuristic §cell_idÙ$dd167494-99d6-45c6-99e4-c36fde5e2d3f´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$ab331778-f892-4690-8bb3-26464e3fc05f„´precedence_heuristic §cell_idÙ$ab331778-f892-4690-8bb3-26464e3fc05f´downstream_cells_map¯windy_gridworld“Ù$75bfe913-8757-4789-b708-7d400c225218Ù$d299d800-a64e-4ba2-9603-efa833343405Ù$897fde24-9a4a-465e-96f2-dd9e8baab294²upstream_cells_map´make_windy_gridworld‘Ù$ec285c96-4a75-4af6-8898-ec3176fa34c6Ù$0e59e813-3d48-4a24-b5b3-9a9de7c500c2„´precedence_heuristic §cell_idÙ$0e59e813-3d48-4a24-b5b3-9a9de7c500c2´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$e4c6456c-867d-4ade-a3c8-310c1e065f14„´precedence_heuristic §cell_idÙ$e4c6456c-867d-4ade-a3c8-310c1e065f14´downstream_cells_map€²upstream_cells_map‚«render_walk‘Ù$de50f95f-984e-4387-958c-64e0265f5953§nstates‘Ù$5455fc97-55cb-4b0e-a3be-9433ccc96fc0Ù$3e767962-7339-4f35-a039-b5521a098ed5„´precedence_heuristic §cell_idÙ$3e767962-7339-4f35-a039-b5521a098ed5´downstream_cells_map¦MDP_TDÜÙ$8e34202a-f841-4464-9017-cd50194f7987Ù$401831c3-3925-465c-a093-28686f0dad2eÙ$c5d32889-634b-4b00-8ba7-0d1ecaf94f05Ù$24a441c8-7aaf-4642-b245-5e1201456d67Ù$d5abd922-a8c2-4f5c-9a6e-d2490a8ad7dcÙ$bfe71b40-3157-47df-8494-67f8eb8e4e93Ù$7035c082-6e50-4df5-919f-5f09d2011b4aÙ$eb735ead-978b-409c-8990-b5fa7a027ebfÙ$415ea466-2038-48fe-9d24-39a90182f1ebÙ$4ddcd409-c31c-444c-8fcf-7cc45b68d93bÙ$209881b3-3ac8-490e-97bd-fa5ae24a39f5Ù$72b4d8d5-464c-4561-8c69-28ef3f59630bÙ$3f3ebc9b-b070-4d73-8be9-823b399c664cÙ$12aac612-758b-4655-8ede-daddd4af6d3eÙ$3ed12c33-ab0a-49b1-b9e7-c4305ba35767Ù$61bbf9db-49a0-4709-83f4-44f228be09c0Ù$ec285c96-4a75-4af6-8898-ec3176fa34c6Ù$2034fd1e-5171-4eda-85d5-2de62d7a1e8bÙ$6556dafb-04fa-434c-868a-8d7bb7b5b196Ù$292d9018-b550-4278-a8e0-78dd6a6853f1Ù$3756a3f8-18e8-4d62-afa1-cfeb4183820cÙ$d526a3a4-63cc-4f94-8f55-98c9a4a9d134Ù$69eedbfd-396f-4461-b7a1-c36abc094581Ù$7ac99619-5232-4db8-8553-d79ea5415d29Ù$8ddf6b9d-d76d-401f-96ad-2a0b5c114fa4²upstream_cells_map†¦Vector¥Int64£new¤Dict¨Functionªmakelookup‘Ù$834e5810-77ea-4dfd-9f37-9d9dbf6585a4Ù$834e5810-77ea-4dfd-9f37-9d9dbf6585a4„´precedence_heuristic §cell_idÙ$834e5810-77ea-4dfd-9f37-9d9dbf6585a4´downstream_cells_mapªmakelookup“Ù$3e767962-7339-4f35-a039-b5521a098ed5Ù$393cd9d2-dd97-496e-b260-ec6e8b1c13b5Ù$ad03500a-bd42-4216-a9cb-3f923152af79²upstream_cells_map„¤Dict©enumerate¢=>¦VectorÙ$667666b9-3ab6-4836-953d-9878208103c9„´precedence_heuristic §cell_idÙ$667666b9-3ab6-4836-953d-9878208103c9´downstream_cells_map€²upstream_cells_map‚Ù,gridworld_Q_vs_sarsa_vs_expected_sarsa_solve‘Ù$84584793-8274-4aa1-854f-b167c7434548ªcliffworld‘Ù$6faa3015-3ac4-44af-a78c-10b175822441Ù$87fadfc0-2cdb-4be2-81ad-e8fdeffb690c„´precedence_heuristic §cell_idÙ$87fadfc0-2cdb-4be2-81ad-e8fdeffb690c´downstream_cells_map®show_mrp_state‘Ù$1dd1ba55-548a-41f6-903e-70742fd60e3d²upstream_cells_mapˆ¦length¡:£min¤HTML¢>=§collect¦isless¢==Ù$4019c974-dcaa-46c8-ac90-e6566a376ea1„´precedence_heuristic §cell_idÙ$4019c974-dcaa-46c8-ac90-e6566a376ea1´downstream_cells_map·begin_value_iteration_v”Ù$d299d800-a64e-4ba2-9603-efa833343405Ù$33d69db9-fa2b-40a3-bbed-21d5fd60f302Ù$bb085f2e-83cb-45b2-adf6-c07da892d6e1Ù$1f28280e-ba3b-4ca5-89e4-6ca4a90f5893²upstream_cells_map‹¤zero§typemax¤copy£eps¥Int64¦Vector¤Real²value_iteration_v!‘Ù$8787a5fd-d0ab-46b5-a7df-e7bc103a7378«CompleteMDP‘Ù$d7566d1b-8938-4e2c-8c54-124f790e72ae³make_greedy_policy!“Ù$cb07a6a5-c50a-4900-9e5b-a17dc7ee5710Ù$685a7ba3-0f94-4663-a68a-73fa03bd9445Ù$c4919d14-8cba-43e6-9369-efc52bcb9b23²form_random_policy‘Ù$0748902c-ffc0-4634-9a1b-e642b3dfb77bÙ$4d4577b5-3753-450d-a247-ebd8c3e8f799„´precedence_heuristic §cell_idÙ$4d4577b5-3753-450d-a247-ebd8c3e8f799´downstream_cells_map·create_Ïµ_greedy_policy–Ù$61bbf9db-49a0-4709-83f4-44f228be09c0Ù$2034fd1e-5171-4eda-85d5-2de62d7a1e8bÙ$292d9018-b550-4278-a8e0-78dd6a6853f1Ù$3756a3f8-18e8-4d62-afa1-cfeb4183820cÙ$d526a3a4-63cc-4f94-8f55-98c9a4a9d134Ù$69eedbfd-396f-4461-b7a1-c36abc094581²upstream_cells_map‡¤Real¡:¦Matrix¥zeros¤size¶make_Ïµ_greedy_policy!‘Ù$6b496582-cc0e-4195-87ef-94792b0fff54¤copyÙ$e19db54c-4b3c-42d1-b016-9620daf89bfb„´precedence_heuristic §cell_idÙ$e19db54c-4b3c-42d1-b016-9620daf89bfb´downstream_cells_mapŠ¢Up‘Ù$e19db54c-4b3c-42d1-b016-9620daf89bfbªapply_wind“Ù$ec285c96-4a75-4af6-8898-ec3176fa34c6Ù$95245673-2c29-401e-bb4b-a39dc8172297Ù$07c57f37-22be-4c39-8279-d80addcea0c5¯GridworldAction•Ù$e19db54c-4b3c-42d1-b016-9620daf89bfbÙ$ec285c96-4a75-4af6-8898-ec3176fa34c6Ù$031e1106-7408-4c7e-b78e-b713c19123d1Ù$e9359ca3-4d11-4365-bc6e-7babc6fcc7deÙ$6556dafb-04fa-434c-868a-8d7bb7b5b196¤Left‘Ù$e19db54c-4b3c-42d1-b016-9620daf89bfb¥Right‘Ù$e19db54c-4b3c-42d1-b016-9620daf89bfb©wind_vals™Ù$ec285c96-4a75-4af6-8898-ec3176fa34c6Ù$d299d800-a64e-4ba2-9603-efa833343405Ù$c34678f6-53bb-4f2a-96f0-a7b16f894dddÙ$95245673-2c29-401e-bb4b-a39dc8172297Ù$07c57f37-22be-4c39-8279-d80addcea0c5Ù$71774d5f-7841-403f-bc6b-1a0cbbb72d6dÙ$2f4e2da2-b1a1-41b1-8904-39b59f426da4Ù$0e488135-49e5-4e71-83b1-05d8e61f0510Ù$8e15f4b5-0dc7-47a5-9477-9f4d8807b331®GridworldState˜Ù$ec285c96-4a75-4af6-8898-ec3176fa34c6Ù$6556dafb-04fa-434c-868a-8d7bb7b5b196Ù$64b210e8-223f-41f7-a6b7-8af6183ddf87Ù$07c57f37-22be-4c39-8279-d80addcea0c5Ù$71774d5f-7841-403f-bc6b-1a0cbbb72d6dÙ$2f4e2da2-b1a1-41b1-8904-39b59f426da4Ù$0e488135-49e5-4e71-83b1-05d8e61f0510Ù$8e15f4b5-0dc7-47a5-9477-9f4d8807b331¬rook_actions•Ù$ec285c96-4a75-4af6-8898-ec3176fa34c6Ù$031e1106-7408-4c7e-b78e-b713c19123d1Ù$6556dafb-04fa-434c-868a-8d7bb7b5b196Ù$64b210e8-223f-41f7-a6b7-8af6183ddf87Ù$71774d5f-7841-403f-bc6b-1a0cbbb72d6d¤Down‘Ù$e19db54c-4b3c-42d1-b016-9620daf89bfb¤move’Ù$ec285c96-4a75-4af6-8898-ec3176fa34c6Ù$6556dafb-04fa-434c-868a-8d7bb7b5b196²upstream_cells_mapˆ¢Up‘Ù$e19db54c-4b3c-42d1-b016-9620daf89bfb¥Int64¯GridworldAction‘Ù$e19db54c-4b3c-42d1-b016-9620daf89bfb¤Left‘Ù$e19db54c-4b3c-42d1-b016-9620daf89bfb¡-¥Right‘Ù$e19db54c-4b3c-42d1-b016-9620daf89bfb¡+¤Down‘Ù$e19db54c-4b3c-42d1-b016-9620daf89bfbÙ$ed4e863b-22dd-4d2b-88d0-b3a56d6713b7„´precedence_heuristic §cell_idÙ$ed4e863b-22dd-4d2b-88d0-b3a56d6713b7´downstream_cells_map€²upstream_cells_map„³king_action_display‘Ù$cdedd35e-52b8-40a5-938d-2d36f6f93217³display_king_policy‘Ù$9651f823-e1cd-4e6e-9ce0-be9ea1c3f0a4«example_6_5‘Ù$d299d800-a64e-4ba2-9603-efa833343405´stochastic_gridworld‘Ù$4ddc7d99-0b79-4689-bd93-8798b105c0a2Ù$393cd9d2-dd97-496e-b260-ec6e8b1c13b5„´precedence_heuristic §cell_idÙ$393cd9d2-dd97-496e-b260-ec6e8b1c13b5´downstream_cells_map³FiniteAfterstateMDP”Ù$18e60b1d-97ec-432c-a388-003e7fae415fÙ$685a7ba3-0f94-4663-a68a-73fa03bd9445Ù$e947f86e-8dc3-4ce7-a9d4-0a7b675a9fa9Ù$ad03500a-bd42-4216-a9cb-3f923152af79²upstream_cells_map‹¤Dict¥zerosªmakelookup‘Ù$834e5810-77ea-4dfd-9f37-9d9dbf6585a4¦Vector¥Int64¤Real£new¦length«CompleteMDP‘Ù$d7566d1b-8938-4e2c-8c54-124f790e72ae¦Matrix¥ArrayÙ$401831c3-3925-465c-a093-28686f0dad2e„´precedence_heuristic §cell_idÙ$401831c3-3925-465c-a093-28686f0dad2e´downstream_cells_map¶initialize_state_value“Ù$eb735ead-978b-409c-8990-b5fa7a027ebfÙ$415ea466-2038-48fe-9d24-39a90182f1ebÙ$3f3ebc9b-b070-4d73-8be9-823b399c664c²upstream_cells_map…¦length¤ones¡*¦MDP_TD‘Ù$3e767962-7339-4f35-a039-b5521a098ed5AbstractFloatÙ$2d881aa9-1da3-4d1e-8d05-245956dbaf33„´precedence_heuristic §cell_idÙ$2d881aa9-1da3-4d1e-8d05-245956dbaf33´downstream_cells_map€²upstream_cells_map¤HTMLÙ$047a8881-c2ec-4dd1-8778-e3acf9beba2e„´precedence_heuristic §cell_idÙ$047a8881-c2ec-4dd1-8778-e3acf9beba2e´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$29b0a2d5-9629-46cd-b57c-6f3ef797de66„´precedence_heuristic §cell_idÙ$29b0a2d5-9629-46cd-b57c-6f3ef797de66´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$c1d6532c-38a4-488f-9789-07d63fe6f125„´precedence_heuristic §cell_idÙ$c1d6532c-38a4-488f-9789-07d63fe6f125´downstream_cells_map©load_file‘Ù$00d67a93-437c-4cda-899a-9daa1102e1f2²upstream_cells_mapŠ¤Core§@md_str¤Base·PlutoRunner.create_bond«PlutoRunner¨CheckBox¯Core.applicable¥@bind¨Base.get¨getindexÙ$e6672866-c0a0-46f2-bb52-25fcc3352645„´precedence_heuristic §cell_idÙ$e6672866-c0a0-46f2-bb52-25fcc3352645´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$223055df-7d5c-4d99-bc8d-fbc9702f906f„´precedence_heuristic §cell_idÙ$223055df-7d5c-4d99-bc8d-fbc9702f906f´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$35dc0d94-145a-4292-b0df-9e84a286c036„´precedence_heuristic §cell_idÙ$35dc0d94-145a-4292-b0df-9e84a286c036´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$4d7619ee-933f-452a-9202-e95a8f3da20f„´precedence_heuristic §cell_idÙ$4d7619ee-933f-452a-9202-e95a8f3da20f´downstream_cells_map€²upstream_cells_map„·HypertextLiteral.Bypass·HypertextLiteral.Result°HypertextLiteral‘Ù$639840dc-976a-4e5c-987f-a92afb2d99d8¤@htlÙ$00d67a93-437c-4cda-899a-9daa1102e1f2„´precedence_heuristic §cell_idÙ$00d67a93-437c-4cda-899a-9daa1102e1f2´downstream_cells_map€²upstream_cells_map‚¯example_6_7_mdp‘Ù$69eedbfd-396f-4461-b7a1-c36abc094581©load_file‘Ù$c1d6532c-38a4-488f-9789-07d63fe6f125Ù$500d8dd4-fc53-4021-b797-114224ca4deb„´precedence_heuristic §cell_idÙ$500d8dd4-fc53-4021-b797-114224ca4deb´downstream_cells_map³rook_action_display“Ù$75bfe913-8757-4789-b708-7d400c225218Ù$d299d800-a64e-4ba2-9603-efa833343405Ù$c34678f6-53bb-4f2a-96f0-a7b16f894ddd²upstream_cells_map„·HypertextLiteral.Bypass·HypertextLiteral.Result°HypertextLiteral‘Ù$639840dc-976a-4e5c-987f-a92afb2d99d8¤@htlÙ$ff5d051e-5de1-48a9-9578-5dbafd71afd1„´precedence_heuristic §cell_idÙ$ff5d051e-5de1-48a9-9578-5dbafd71afd1´downstream_cells_map€²upstream_cells_map‚¶max_bias_visualization‘Ù$fa04d20f-6e3f-46f8-b3f7-a543d1fa360a±max_visual_params‘Ù$4862942b-d1e2-4ac8-8e88-65205e91a070Ù$e947f86e-8dc3-4ce7-a9d4-0a7b675a9fa9„´precedence_heuristic §cell_idÙ$e947f86e-8dc3-4ce7-a9d4-0a7b675a9fa9´downstream_cells_map·begin_value_iteration_v”Ù$d299d800-a64e-4ba2-9603-efa833343405Ù$33d69db9-fa2b-40a3-bbed-21d5fd60f302Ù$bb085f2e-83cb-45b2-adf6-c07da892d6e1Ù$1f28280e-ba3b-4ca5-89e4-6ca4a90f5893²upstream_cells_map†¦length¤zero³FiniteAfterstateMDP‘Ù$393cd9d2-dd97-496e-b260-ec6e8b1c13b5¤ones¡*¤RealÙ$a925534e-f9b8-471a-9d86-c9212129b630„´precedence_heuristic §cell_idÙ$a925534e-f9b8-471a-9d86-c9212129b630´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$7a5ff8f7-70d4-46f1-a4a7-bbfcec4f6e3f„´precedence_heuristic §cell_idÙ$7a5ff8f7-70d4-46f1-a4a7-bbfcec4f6e3f´downstream_cells_mapsample_action“Ù$d5abd922-a8c2-4f5c-9a6e-d2490a8ad7dcÙ$12aac612-758b-4655-8ede-daddd4af6d3eÙ$3ed12c33-ab0a-49b1-b9e7-c4305ba35767²upstream_cells_map‡¡:§weights¦Matrix¤size¦sample§IntegerAbstractFloatÙ$b5e06f59-33b5-414e-9a81-43e8abd07aa3„´precedence_heuristic §cell_idÙ$b5e06f59-33b5-414e-9a81-43e8abd07aa3´downstream_cells_map€²upstream_cells_map‰§@md_strªq_learning‘Ù$2034fd1e-5171-4eda-85d5-2de62d7a1e8b¨gridsize‘Ù$0c0b875e-69f8-46ed-ad06-df9c36088fbe¯noisy_gridworld‘Ù$98bec66e-d8f3-4d4d-b4ec-5838489164e5±double_q_learning‘Ù$d526a3a4-63cc-4f94-8f55-98c9a4a9d134¦Î±_6_8‘Ù$c9f7646a-ec01-4d90-9215-5027b7c1c885¤fill»show_gridworld_policy_value‘Ù$c34678f6-53bb-4f2a-96f0-a7b16f894ddd¨getindexÙ$a0d2333f-e87b-4981-bb52-d436ec6481c1„´precedence_heuristic §cell_idÙ$a0d2333f-e87b-4981-bb52-d436ec6481c1´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$f841c4d8-5176-4007-b472-9e01a799d85c„´precedence_heuristic §cell_idÙ$f841c4d8-5176-4007-b472-9e01a799d85c´downstream_cells_map«addelements‘Ù$902738c3-2f7b-49cb-8580-29359c857027²upstream_cells_map€Ù$685a7ba3-0f94-4663-a68a-73fa03bd9445„´precedence_heuristic §cell_idÙ$685a7ba3-0f94-4663-a68a-73fa03bd9445´downstream_cells_map³make_greedy_policy!“Ù$84a71bf8-0d66-42cd-ac7b-589d63a16edaÙ$d526a3a4-63cc-4f94-8f55-98c9a4a9d134Ù$4019c974-dcaa-46c8-ac90-e6566a376ea1²upstream_cells_mapÞ¤zero©@fastmath¦isless©@inbounds§nothing¦Vector¡<¶Base.FastMath.max_fast¯Base.simd_index©eachindex¤Real¥@simd¡/¦Matrix£â‰ˆ³FiniteAfterstateMDP‘Ù$393cd9d2-dd97-496e-b260-ec6e8b1c13b5¡:£Inf®julia.simdloop¤BaseµBase.simd_outer_range¡-¶Base.simd_inner_length¶Base.FastMath.add_fast¡+Ù$d5abd922-a8c2-4f5c-9a6e-d2490a8ad7dc„´precedence_heuristic §cell_idÙ$d5abd922-a8c2-4f5c-9a6e-d2490a8ad7dc´downstream_cells_map¨takestep•Ù$bfe71b40-3157-47df-8494-67f8eb8e4e93Ù$eb735ead-978b-409c-8990-b5fa7a027ebfÙ$2034fd1e-5171-4eda-85d5-2de62d7a1e8bÙ$292d9018-b550-4278-a8e0-78dd6a6853f1Ù$3756a3f8-18e8-4d62-afa1-cfeb4183820c²upstream_cells_map…sample_action‘Ù$7a5ff8f7-70d4-46f1-a4a7-bbfcec4f6e3f¦Matrix¦MDP_TD‘Ù$3e767962-7339-4f35-a039-b5521a098ed5¨Function¤RealÙ$bce6e4ab-58ec-4e00-be34-bc4caf51f57d„´precedence_heuristic §cell_idÙ$bce6e4ab-58ec-4e00-be34-bc4caf51f57d´downstream_cells_map¨cum_mean‘Ù$fa04d20f-6e3f-46f8-b3f7-a543d1fa360a²upstream_cells_mapˆ¦length¤zero®AbstractVector©enumerate¡/¥zeros¡+¤RealÙ$4ddcd409-c31c-444c-8fcf-7cc45b68d93b„´precedence_heuristic §cell_idÙ$4ddcd409-c31c-444c-8fcf-7cc45b68d93b´downstream_cells_map¨make_mrp”Ù$4b0d96d0-25d1-4fed-b105-c65fa2883a61Ù$2786101e-d365-4d6a-8de7-b9794499efb4Ù$ddf3bb61-16c9-48c4-95d4-263260309762Ù$1e3d231a-4065-48ce-a74e-018066fb232a²upstream_cells_map¡:§collect¤ceil¤rand©mrp_moves‘Ù$846720cc-550a-4a3c-a80e-40b99671f4e2¥Int64¥floor£mod§Float32¡/¡+¢==¦MDP_TD‘Ù$3e767962-7339-4f35-a039-b5521a098ed5Ù$c5d32889-634b-4b00-8ba7-0d1ecaf94f05„´precedence_heuristic §cell_idÙ$c5d32889-634b-4b00-8ba7-0d1ecaf94f05´downstream_cells_map½initialize_state_action_value•Ù$61bbf9db-49a0-4709-83f4-44f228be09c0Ù$2034fd1e-5171-4eda-85d5-2de62d7a1e8bÙ$292d9018-b550-4278-a8e0-78dd6a6853f1Ù$3756a3f8-18e8-4d62-afa1-cfeb4183820cÙ$d526a3a4-63cc-4f94-8f55-98c9a4a9d134²upstream_cells_map…¦length¤ones¡*¦MDP_TD‘Ù$3e767962-7339-4f35-a039-b5521a098ed5AbstractFloatÙ$3b16cbb7-f859-4871-9a63-8b40eb4191be„´precedence_heuristic §cell_idÙ$3b16cbb7-f859-4871-9a63-8b40eb4191be´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$902738c3-2f7b-49cb-8580-29359c857027„´precedence_heuristic §cell_idÙ$902738c3-2f7b-49cb-8580-29359c857027´downstream_cells_map€²upstream_cells_map‰¡:·HypertextLiteral.Bypass·HypertextLiteral.Result°HypertextLiteral‘Ù$639840dc-976a-4e5c-987f-a92afb2d99d8¹HypertextLiteral.StyleTag©mapreduce¤@htl«addelements‘Ù$f841c4d8-5176-4007-b472-9e01a799d85cªstatestyle‘Ù$889611fb-7dac-4769-9251-9a90e3a1422fÙ$c93ed1f2-3c38-4f68-8bf8-2cdf4e7bee34„´precedence_heuristic §cell_idÙ$c93ed1f2-3c38-4f68-8bf8-2cdf4e7bee34´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$f36822d7-9ea8-4f5c-9925-dc2a466a68ba„´precedence_heuristic §cell_idÙ$f36822d7-9ea8-4f5c-9925-dc2a466a68ba´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$3e367811-247b-4bd6-b8fe-63f8996fb9e8„´precedence_heuristic §cell_idÙ$3e367811-247b-4bd6-b8fe-63f8996fb9e8´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$7de9b6a4-49ce-4dc3-9d5b-cecfcb98bba1„´precedence_heuristic §cell_idÙ$7de9b6a4-49ce-4dc3-9d5b-cecfcb98bba1´downstream_cells_map¸jacks_car_afterstate_mdp‘Ù$1f28280e-ba3b-4ca5-89e4-6ca4a90f5893²upstream_cells_mapÙ create_car_rental_afterstate_mdp‘Ù$ad03500a-bd42-4216-a9cb-3f923152af79Ù$c4719c42-87aa-482a-95aa-a1492d42835d„´precedence_heuristic §cell_idÙ$c4719c42-87aa-482a-95aa-a1492d42835d´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$495f5606-0567-47ad-a266-d21320eecfc6„´precedence_heuristic §cell_idÙ$495f5606-0567-47ad-a266-d21320eecfc6´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$0a4ed8c7-27ca-45cb-af15-70ddd86240fb„´precedence_heuristic §cell_idÙ$0a4ed8c7-27ca-45cb-af15-70ddd86240fb´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$cdedd35e-52b8-40a5-938d-2d36f6f93217„´precedence_heuristic §cell_idÙ$cdedd35e-52b8-40a5-938d-2d36f6f93217´downstream_cells_map³king_action_display—Ù$f0f9d3d5-e76a-4472-bfb1-da29d73a7916Ù$ed4e863b-22dd-4d2b-88d0-b3a56d6713b7Ù$8bc54c94-9c92-4904-b3a6-13ff3f0110bbÙ$678cad7a-1abb-4fcc-91ba-b5abcbb914cbÙ$9da5fd84-800d-4b3e-8627-e90ce8f20297Ù$1115f3ec-f4b2-4fba-bd5e-321a63b10a6dÙ$1e45a661-c2e1-40c2-b27b-5f80f95efdab²upstream_cells_map„·HypertextLiteral.Bypass·HypertextLiteral.Result°HypertextLiteral‘Ù$639840dc-976a-4e5c-987f-a92afb2d99d8¤@htlÙ$3756a3f8-18e8-4d62-afa1-cfeb4183820c„´precedence_heuristic §cell_idÙ$3756a3f8-18e8-4d62-afa1-cfeb4183820c´downstream_cells_mapµdouble_expected_sarsa“Ù$d526a3a4-63cc-4f94-8f55-98c9a4a9d134Ù$69eedbfd-396f-4461-b7a1-c36abc094581Ù$33d69db9-fa2b-40a3-bbed-21d5fd60f302²upstream_cells_mapÞ¤zero£sum¡!¦isless£one¦Vector¦length¡<¤copy©eachindex¡/¦Matrix½initialize_state_action_value‘Ù$c5d32889-634b-4b00-8ba7-0d1ecaf94f05¢==¦MDP_TD‘Ù$3e767962-7339-4f35-a039-b5521a098ed5AbstractFloat¡:¥first¥zeros¤rand§findall¥Int64¨takestep‘Ù$d5abd922-a8c2-4f5c-9a6e-d2490a8ad7dc¡-´create_greedy_policy‘Ù$84a71bf8-0d66-42cd-ac7b-589d63a16eda¡+¥undef¡*¶make_Ïµ_greedy_policy!‘Ù$6b496582-cc0e-4195-87ef-94792b0fff54·create_Ïµ_greedy_policy‘Ù$4d4577b5-3753-450d-a247-ebd8c3e8f799Ù$04a0be81-ee5f-4eeb-963a-ad930392d50b„´precedence_heuristic §cell_idÙ$04a0be81-ee5f-4eeb-963a-ad930392d50b´downstream_cells_map€²upstream_cells_map«example_6_5‘Ù$d299d800-a64e-4ba2-9603-efa833343405Ù$136d1d96-b590-4f03-9e42-2337efc560cc„´precedence_heuristic §cell_idÙ$136d1d96-b590-4f03-9e42-2337efc560cc´downstream_cells_map€²upstream_cells_map¤HTMLÙ$6bffb08c-704a-4b7c-bfce-b3d099cf35c0„´precedence_heuristic §cell_idÙ$6bffb08c-704a-4b7c-bfce-b3d099cf35c0´downstream_cells_mapºgridworld_Q_vs_sarsa_solve‘Ù$a4c4d5f2-d76d-425e-b8c9-9047fe53c4f0²upstream_cells_mapÞ¡:¥sarsa‘Ù$61bbf9db-49a0-4709-83f4-44f228be09c0·HypertextLiteral.Bypass¥Tuple©plot_path’Ù$9f28772c-9afe-4253-ab3b-055b0f48be6eÙ$bd1029f9-d6a8-4c68-98cd-8af94297b521¸HypertextLiteral.content©mapreduce£zip¤@htlªq_learning‘Ù$2034fd1e-5171-4eda-85d5-2de62d7a1e8b©eachindex´create_greedy_policy‘Ù$84a71bf8-0d66-42cd-ac7b-589d63a16eda§scatter¤plot¡/°HypertextLiteral‘Ù$639840dc-976a-4e5c-987f-a92afb2d99d8¡+¤attr·HypertextLiteral.Result¤fill¦LayoutÙ$f95ceb98-f12e-4650-9ad3-0609b7ecd0f3„´precedence_heuristic §cell_idÙ$f95ceb98-f12e-4650-9ad3-0609b7ecd0f3´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$8787a5fd-d0ab-46b5-a7df-e7bc103a7378„´precedence_heuristic §cell_idÙ$8787a5fd-d0ab-46b5-a7df-e7bc103a7378´downstream_cells_map²value_iteration_v!‘Ù$4019c974-dcaa-46c8-ac90-e6566a376ea1²upstream_cells_map…¢<=¶bellman_optimal_value!’Ù$18e60b1d-97ec-432c-a388-003e7fae415fÙ$dea61907-d4fb-492d-b2bb-c037c7f785cb¥push!¡-¤copyÙ$03a06e10-f68a-403c-97bf-7a7627f2c5d6„´precedence_heuristic §cell_idÙ$03a06e10-f68a-403c-97bf-7a7627f2c5d6´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$0d6a11af-b146-4bbc-997e-a11b897269a7„´precedence_heuristic §cell_idÙ$0d6a11af-b146-4bbc-997e-a11b897269a7´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$72b4d8d5-464c-4561-8c69-28ef3f59630b„´precedence_heuristic §cell_idÙ$72b4d8d5-464c-4561-8c69-28ef3f59630b´downstream_cells_mapupdate_value!‘Ù$3f3ebc9b-b070-4d73-8be9-823b399c664c²upstream_cells_mapŽ¡:¤zero£max¦isless¢MC‘Ù$620a6426-cb29-4010-997b-aa4f9d5f8fb0¨Function¦Vector¦length¡-¡+ªcalc_error‘Ù$3d8b1ccd-9bb3-42f2-a77a-6afdb72c1ff8¡*¦MDP_TD‘Ù$3e767962-7339-4f35-a039-b5521a098ed5AbstractFloatÙ$47c2cbdd-f6db-4ce5-bae2-8141f30aacbc„´precedence_heuristic §cell_idÙ$47c2cbdd-f6db-4ce5-bae2-8141f30aacbc´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$8224b808-5778-458b-b683-ea2603c82117„´precedence_heuristic §cell_idÙ$8224b808-5778-458b-b683-ea2603c82117´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$c4919d14-8cba-43e6-9369-efc52bcb9b23„´precedence_heuristic §cell_idÙ$c4919d14-8cba-43e6-9369-efc52bcb9b23´downstream_cells_map³make_greedy_policy!“Ù$84a71bf8-0d66-42cd-ac7b-589d63a16edaÙ$d526a3a4-63cc-4f94-8f55-98c9a4a9d134Ù$4019c974-dcaa-46c8-ac90-e6566a376ea1²upstream_cells_mapÞ£â‰ˆ¤zero¡:£max£sum£Inf¦isless¦Vector¤Real©eachindex¡-©FiniteMDP‘Ù$d7566d1b-8938-4e2c-8c54-124f790e72ae¡/¦Matrix¡+¡*Ù$05664aaf-575b-4249-974c-d8a2e63f380a„´precedence_heuristic §cell_idÙ$05664aaf-575b-4249-974c-d8a2e63f380a´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$dda222ef-8178-40bb-bf20-d242924c4fab„´precedence_heuristic §cell_idÙ$dda222ef-8178-40bb-bf20-d242924c4fab´downstream_cells_map®king_gridworld’Ù$f0f9d3d5-e76a-4472-bfb1-da29d73a7916Ù$1115f3ec-f4b2-4fba-bd5e-321a63b10a6d²upstream_cells_map‚¬king_actions‘Ù$031e1106-7408-4c7e-b78e-b713c19123d1´make_windy_gridworld‘Ù$ec285c96-4a75-4af6-8898-ec3176fa34c6Ù$48b557e3-e239-45e9-ab15-105bcca96492„´precedence_heuristic §cell_idÙ$48b557e3-e239-45e9-ab15-105bcca96492´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$846720cc-550a-4a3c-a80e-40b99671f4e2„´precedence_heuristic §cell_idÙ$846720cc-550a-4a3c-a80e-40b99671f4e2´downstream_cells_map©mrp_moves‘Ù$4ddcd409-c31c-444c-8fcf-7cc45b68d93b²upstream_cells_map€Ù$6556dafb-04fa-434c-868a-8d7bb7b5b196„´precedence_heuristic §cell_idÙ$6556dafb-04fa-434c-868a-8d7bb7b5b196´downstream_cells_map¯make_cliffworld‘Ù$6faa3015-3ac4-44af-a78c-10b175822441²upstream_cells_map¬rook_actions‘Ù$e19db54c-4b3c-42d1-b016-9620daf89bfbAbstractFloat¡:¡>¦isless¡<¥Int64¯GridworldAction‘Ù$e19db54c-4b3c-42d1-b016-9620daf89bfb¥clamp¢==¤move“Ù$e19db54c-4b3c-42d1-b016-9620daf89bfbÙ$031e1106-7408-4c7e-b78e-b713c19123d1Ù$e9359ca3-4d11-4365-bc6e-7babc6fcc7de®GridworldState‘Ù$e19db54c-4b3c-42d1-b016-9620daf89bfb¦MDP_TD‘Ù$3e767962-7339-4f35-a039-b5521a098ed5Ù$3f4f078a-9fc4-4b02-b499-a805fd5f1071„´precedence_heuristic §cell_idÙ$3f4f078a-9fc4-4b02-b499-a805fd5f1071´downstream_cells_map»max_bias_visualization_comp‘Ù$2651af2d-56a8-4f7e-a56a-45cabd665c72²upstream_cells_mapŽ¥randn¦argmax¡:§collect¤view©mapreduce§scatter¤plot¡/¡+¡*¦Layout¤mean§maximumÙ$75bfe913-8757-4789-b708-7d400c225218„´precedence_heuristic §cell_idÙ$75bfe913-8757-4789-b708-7d400c225218´downstream_cells_map€²upstream_cells_mapˆ¯windy_gridworld‘Ù$ab331778-f892-4690-8bb3-26464e3fc05f·HypertextLiteral.Bypass³rook_action_display‘Ù$500d8dd4-fc53-4021-b797-114224ca4deb°HypertextLiteral‘Ù$639840dc-976a-4e5c-987f-a92afb2d99d8·HypertextLiteral.Result¸HypertextLiteral.content©plot_path’Ù$9f28772c-9afe-4253-ab3b-055b0f48be6eÙ$bd1029f9-d6a8-4c68-98cd-8af94297b521¤@htlÙ$fe2ebf39-4ab3-4aa8-abbd-23389eaf400e„´precedence_heuristic §cell_idÙ$fe2ebf39-4ab3-4aa8-abbd-23389eaf400e´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$98bec66e-d8f3-4d4d-b4ec-5838489164e5„´precedence_heuristic §cell_idÙ$98bec66e-d8f3-4d4d-b4ec-5838489164e5´downstream_cells_map¯noisy_gridworld“Ù$b5e06f59-33b5-414e-9a81-43e8abd07aa3Ù$297f1606-4ec2-4075-9f81-926dc517b76fÙ$33d69db9-fa2b-40a3-bbed-21d5fd60f302²upstream_cells_map‚´make_noisy_gridworld‘Ù$64b210e8-223f-41f7-a6b7-8af6183ddf87¨gridsize‘Ù$0c0b875e-69f8-46ed-ad06-df9c36088fbeÙ$b59eacf8-7f78-4015-bf2c-66f89bf0e24e„´precedence_heuristic §cell_idÙ$b59eacf8-7f78-4015-bf2c-66f89bf0e24e´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$1ae30f5d-b25b-4dcb-800f-45c463641ec5„´precedence_heuristic §cell_idÙ$1ae30f5d-b25b-4dcb-800f-45c463641ec5´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$7d3be915-9092-4261-8435-dd546a7db144„´precedence_heuristic §cell_idÙ$7d3be915-9092-4261-8435-dd546a7db144´downstream_cells_map§cum_max‘Ù$fa04d20f-6e3f-46f8-b3f7-a543d1fa360a²upstream_cells_map‡§similar£max®AbstractVector©enumerate¥first¦isless¤RealÙ$71774d5f-7841-403f-bc6b-1a0cbbb72d6d„´precedence_heuristic §cell_idÙ$71774d5f-7841-403f-bc6b-1a0cbbb72d6d´downstream_cells_map¶windy_gridworld_mdp_dp²upstream_cells_map„©wind_vals‘Ù$e19db54c-4b3c-42d1-b016-9620daf89bfb®GridworldState‘Ù$e19db54c-4b3c-42d1-b016-9620daf89bfb¬rook_actions‘Ù$e19db54c-4b3c-42d1-b016-9620daf89bfb´create_gridworld_mdp’Ù$95245673-2c29-401e-bb4b-a39dc8172297Ù$7ac99619-5232-4db8-8553-d79ea5415d29Ù$22c2213e-5b9b-410f-a0ef-8f1e3db3c532„´precedence_heuristic §cell_idÙ$22c2213e-5b9b-410f-a0ef-8f1e3db3c532´downstream_cells_map€²upstream_cells_mapƒ«example_6_3‘Ù$1e3d231a-4065-48ce-a74e-018066fb232a§Float32ªparams_6_2‘Ù$187fc682-2282-46ca-b988-c9de438f36fdÙ$39470c74-e554-4f6c-919d-97bec1eec0f3„´precedence_heuristic §cell_idÙ$39470c74-e554-4f6c-919d-97bec1eec0f3´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$9da5fd84-800d-4b3e-8627-e90ce8f20297„´precedence_heuristic §cell_idÙ$9da5fd84-800d-4b3e-8627-e90ce8f20297´downstream_cells_map°show_grid_policy’Ù$d299d800-a64e-4ba2-9603-efa833343405Ù$c34678f6-53bb-4f2a-96f0-a7b16f894ddd²upstream_cells_mapÞ³king_action_display‘Ù$cdedd35e-52b8-40a5-938d-2d36f6f93217©findfirst¡:·HypertextLiteral.Bypass¸HypertextLiteral.content©mapreduce¤@htl¦Vector©eachindex¡-¤HTMLÙ HypertextLiteral.attribute_value·HypertextLiteral.Result°HypertextLiteral‘Ù$639840dc-976a-4e5c-987f-a92afb2d99d8¡+¹HypertextLiteral.StyleTag¡*§maximumÙ$415ea466-2038-48fe-9d24-39a90182f1eb„´precedence_heuristic §cell_idÙ$415ea466-2038-48fe-9d24-39a90182f1eb´downstream_cells_map²monte_carlo_pred_V‘Ù$2786101e-d365-4d6a-8de7-b9794499efb4²upstream_cells_mapÞ¤zero¡:¥zeros§Integer¦Vector¦length§findall¡-©enumerate¶initialize_state_value‘Ù$401831c3-3925-465c-a093-28686f0dad2e¦Matrix¡+¬check_policy‘Ù$24a441c8-7aaf-4642-b245-5e1201456d67¡*¦MDP_TD‘Ù$3e767962-7339-4f35-a039-b5521a098ed5ªrunepisode’Ù$bfe71b40-3157-47df-8494-67f8eb8e4e93Ù$7035c082-6e50-4df5-919f-5f09d2011b4aAbstractFloatÙ$0e488135-49e5-4e71-83b1-05d8e61f0510„´precedence_heuristic §cell_idÙ$0e488135-49e5-4e71-83b1-05d8e61f0510´downstream_cells_map¹kingplus_gridworld_mdp_dp²upstream_cells_map…¬king_actions‘Ù$031e1106-7408-4c7e-b78e-b713c19123d1©wind_vals‘Ù$e19db54c-4b3c-42d1-b016-9620daf89bfb®GridworldState‘Ù$e19db54c-4b3c-42d1-b016-9620daf89bfb¤Stay‘Ù$e9359ca3-4d11-4365-bc6e-7babc6fcc7de´create_gridworld_mdp’Ù$95245673-2c29-401e-bb4b-a39dc8172297Ù$7ac99619-5232-4db8-8553-d79ea5415d29Ù$1f28280e-ba3b-4ca5-89e4-6ca4a90f5893„´precedence_heuristic §cell_idÙ$1f28280e-ba3b-4ca5-89e4-6ca4a90f5893´downstream_cells_mapƒ¶car_afterstate_results±Ï€_car_afterstate°v_car_afterstate²upstream_cells_map‡´makepolicyvalueplots‘Ù$30e663da-282c-42ff-8171-dbe3c5c467c6§@md_str¦length¸jacks_car_afterstate_mdp‘Ù$7de9b6a4-49ce-4dc3-9d5b-cecfcb98bba1·begin_value_iteration_v“Ù$e947f86e-8dc3-4ce7-a9d4-0a7b675a9fa9Ù$4019c974-dcaa-46c8-ac90-e6566a376ea1Ù$3134e913-1e86-495d-a558-c3ec4828bf7b£end¨getindexÙ$6d9ae541-cf8c-4687-9f0a-f008944657e3„´precedence_heuristic §cell_idÙ$6d9ae541-cf8c-4687-9f0a-f008944657e3´downstream_cells_mapªfigure_6_3‘Ù$cafedde8-be94-4697-a511-510a5fea0155²upstream_cells_mapÞ!¦Layout®Base.Threads.*¥sarsa‘Ù$61bbf9db-49a0-4709-83f4-44f228be09c0»Base.Threads.threadpoolsize£zip®Base.Threads.-¦lengthªq_learning‘Ù$2034fd1e-5171-4eda-85d5-2de62d7a1e8b®Base.Threads.+©eachindex«deserialize§scatter²Base.Threads.error®Base.Threads.>¯Base.Threads.!=¤mean¨@threads®expected_sarsa‘Ù$292d9018-b550-4278-a8e0-78dd6a6853f1¡:ºBase.Threads.threading_run¥zeros¦isfile³Base.Threads.divrem¤Base·Base.Threads.firstindex¤plot¤attr³Base.Threads.length®Base.Threads.:¯Base.Threads.<=¯Base.Threads.==¥ccall©serializeÙ$d4e39164-9833-4deb-84ca-22f49a1c33d8„´precedence_heuristic §cell_idÙ$d4e39164-9833-4deb-84ca-22f49a1c33d8´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$f2115666-86ce-4c80-9eb7-490cc7a7715c„´precedence_heuristic §cell_idÙ$f2115666-86ce-4c80-9eb7-490cc7a7715c´downstream_cells_map€²upstream_cells_map‚§@md_str¨getindexÙ$2155adfa-7a93-4960-950e-1b123da9eea4„´precedence_heuristic §cell_idÙ$2155adfa-7a93-4960-950e-1b123da9eea4´downstream_cells_map€²upstream_cells_map¬king_actions‘Ù$031e1106-7408-4c7e-b78e-b713c19123d1´cell_execution_orderÜáÙ$639840dc-976a-4e5c-987f-a92afb2d99d8Ù$814d89be-cfdf-11ec-3295-49a8f302bbcfÙ$495f5606-0567-47ad-a266-d21320eecfc6Ù$410abe1d-04a6-4434-9abf-0d29dd6498e6Ù$7a5ff8f7-70d4-46f1-a4a7-bbfcec4f6e3fÙ$834e5810-77ea-4dfd-9f37-9d9dbf6585a4Ù$3e767962-7339-4f35-a039-b5521a098ed5Ù$8e34202a-f841-4464-9017-cd50194f7987Ù$401831c3-3925-465c-a093-28686f0dad2eÙ$c5d32889-634b-4b00-8ba7-0d1ecaf94f05Ù$24a441c8-7aaf-4642-b245-5e1201456d67Ù$d5abd922-a8c2-4f5c-9a6e-d2490a8ad7dcÙ$bfe71b40-3157-47df-8494-67f8eb8e4e93Ù$7035c082-6e50-4df5-919f-5f09d2011b4aÙ$eb735ead-978b-409c-8990-b5fa7a027ebfÙ$415ea466-2038-48fe-9d24-39a90182f1ebÙ$a0d2333f-e87b-4981-bb52-d436ec6481c1Ù$3b16cbb7-f859-4871-9a63-8b40eb4191beÙ$d4e39164-9833-4deb-84ca-22f49a1c33d8Ù$c93ed1f2-3c38-4f68-8bf8-2cdf4e7bee34Ù$1e3b3234-3fe1-46c9-82b7-f729c656eb25Ù$c09530bc-f37e-4d57-a267-14d4027147daÙ$b5187232-d808-49b6-9f7e-a4cbeb6c2b3eÙ$5f32fed0-c921-4cbb-85fe-ade54d4c6c95Ù$bc8bad61-a49a-47d6-8fa6-7dcf6c221910Ù$6edb550d-5c9f-4ea6-8746-6632806df11eÙ$0f22e85f-ed31-49df-a7c7-0579298f05feÙ$9017093c-a9c3-40ea-a9c6-881ee62fc379Ù$5290ae65-6f56-4849-a842-fe347315c6dcÙ$47c2cbdd-f6db-4ce5-bae2-8141f30aacbcÙ$5455fc97-55cb-4b0e-a3be-9433ccc96fc0Ù$a9dda9b5-f568-481c-9e8f-9bb887468775Ù$846720cc-550a-4a3c-a80e-40b99671f4e2Ù$4ddcd409-c31c-444c-8fcf-7cc45b68d93bÙ$4b0d96d0-25d1-4fed-b105-c65fa2883a61Ù$64fe8336-d1c2-41fe-a522-1b6f63260fc9Ù$12c5efe4-d64d-4b82-877c-29b0e537fee6Ù$53145cc2-784c-468b-8e91-9bb7866db218Ù$54d97122-2d01-46ec-aafe-00bfc9f2d6d1Ù$a5009785-64b4-489b-a967-f7840b4a9463Ù$de50f95f-984e-4387-958c-64e0265f5953Ù$e4c6456c-867d-4ade-a3c8-310c1e065f14Ù$f841c4d8-5176-4007-b472-9e01a799d85cÙ$889611fb-7dac-4769-9251-9a90e3a1422fÙ$902738c3-2f7b-49cb-8580-29359c857027Ù$510761f6-66c7-4faf-937b-e1422ec829a6Ù$87fadfc0-2cdb-4be2-81ad-e8fdeffb690cÙ$1dd1ba55-548a-41f6-903e-70742fd60e3dÙ$2786101e-d365-4d6a-8de7-b9794499efb4Ù$9db7a268-1e6d-4366-a0ec-ebf54916d3b0Ù$0b9c6dbd-4eb3-4167-886e-64db9ec7ff04Ù$52aebb7b-c2a9-443f-bc03-24cd25793b32Ù$e6672866-c0a0-46f2-bb52-25fcc3352645Ù$f2115666-86ce-4c80-9eb7-490cc7a7715cÙ$c360945e-f8b2-4c6f-a70c-6ab4ddcf5b54Ù$ddf3bb61-16c9-48c4-95d4-263260309762Ù$e8f94345-9ad5-48d4-8709-d796fb55db3fÙ$a72d07bf-e337-4bd4-af5c-44d74d163b6bÙ$105c5c23-270d-437e-89dd-12297814c6e0Ù$48b557e3-e239-45e9-ab15-105bcca96492Ù$187fc682-2282-46ca-b988-c9de438f36fdÙ$0a4ed8c7-27ca-45cb-af15-70ddd86240fbÙ$620a6426-cb29-4010-997b-aa4f9d5f8fb0Ù$3d8b1ccd-9bb3-42f2-a77a-6afdb72c1ff8Ù$209881b3-3ac8-490e-97bd-fa5ae24a39f5Ù$72b4d8d5-464c-4561-8c69-28ef3f59630bÙ$3f3ebc9b-b070-4d73-8be9-823b399c664cÙ$1e3d231a-4065-48ce-a74e-018066fb232aÙ$22c2213e-5b9b-410f-a0ef-8f1e3db3c532Ù$0e59e813-3d48-4a24-b5b3-9a9de7c500c2Ù$0d6a11af-b146-4bbc-997e-a11b897269a7Ù$a925534e-f9b8-471a-9d86-c9212129b630Ù$62a9a36a-bedb-4f5a-80a4-2d4111a65c12Ù$b35264b0-ac5b-40ce-95e4-9b2bc4cb106fÙ$4d7619ee-933f-452a-9202-e95a8f3da20fÙ$fe2ebf39-4ab3-4aa8-abbd-23389eaf400eÙ$1ae30f5d-b25b-4dcb-800f-45c463641ec5Ù$6a1503c6-c77b-4e3a-9f07-74b2af1a5ff7Ù$6b496582-cc0e-4195-87ef-94792b0fff54Ù$cb07a6a5-c50a-4900-9e5b-a17dc7ee5710Ù$4d4577b5-3753-450d-a247-ebd8c3e8f799Ù$12aac612-758b-4655-8ede-daddd4af6d3eÙ$3ed12c33-ab0a-49b1-b9e7-c4305ba35767Ù$61bbf9db-49a0-4709-83f4-44f228be09c0Ù$8d05403a-adeb-40ac-a98a-87586d5a5170Ù$e19db54c-4b3c-42d1-b016-9620daf89bfbÙ$500d8dd4-fc53-4021-b797-114224ca4debÙ$136d1d96-b590-4f03-9e42-2337efc560ccÙ$4556cf44-4a1c-4ca4-bfb8-4841301a2ce6Ù$9f28772c-9afe-4253-ab3b-055b0f48be6eÙ$bd1029f9-d6a8-4c68-98cd-8af94297b521Ù$0ad739c9-8aca-4b82-bf20-c73584d29535Ù$031e1106-7408-4c7e-b78e-b713c19123d1Ù$cdedd35e-52b8-40a5-938d-2d36f6f93217Ù$9651f823-e1cd-4e6e-9ce0-be9ea1c3f0a4Ù$2155adfa-7a93-4960-950e-1b123da9eea4Ù$d259ecca-0249-4b28-a4d7-6880d4d84495Ù$39470c74-e554-4f6c-919d-97bec1eec0f3Ù$e9359ca3-4d11-4365-bc6e-7babc6fcc7deÙ$ec285c96-4a75-4af6-8898-ec3176fa34c6Ù$ab331778-f892-4690-8bb3-26464e3fc05fÙ$75bfe913-8757-4789-b708-7d400c225218Ù$dda222ef-8178-40bb-bf20-d242924c4fabÙ$db31579e-3e56-4271-8fc3-eb13bc95ac27Ù$b59eacf8-7f78-4015-bf2c-66f89bf0e24eÙ$02f34da1-551f-4ce5-a588-7f3a14afd716Ù$aa0791a5-8cf1-499b-9900-4d0c59be808cÙ$4ddc7d99-0b79-4689-bd93-8798b105c0a2Ù$2d881aa9-1da3-4d1e-8d05-245956dbaf33Ù$8bc54c94-9c92-4904-b3a6-13ff3f0110bbÙ$678cad7a-1abb-4fcc-91ba-b5abcbb914cbÙ$9da5fd84-800d-4b3e-8627-e90ce8f20297Ù$44c49006-e210-4f97-916e-fe62f36c593fÙ$2034fd1e-5171-4eda-85d5-2de62d7a1e8bÙ$c34678f6-53bb-4f2a-96f0-a7b16f894dddÙ$9d01c0ef-6313-4091-b444-3e9765aba90cÙ$4b1a4c14-3c2b-40c0-995c-cd0334ed8b3aÙ$897fde24-9a4a-465e-96f2-dd9e8baab294Ù$f2776908-d06a-4073-b2ce-ecbf109c9cc7Ù$1115f3ec-f4b2-4fba-bd5e-321a63b10a6dÙ$c4719c42-87aa-482a-95aa-a1492d42835dÙ$1e45a661-c2e1-40c2-b27b-5f80f95efdabÙ$8224b808-5778-458b-b683-ea2603c82117Ù$6556dafb-04fa-434c-868a-8d7bb7b5b196Ù$6faa3015-3ac4-44af-a78c-10b175822441Ù$05664aaf-575b-4249-974c-d8a2e63f380aÙ$2a3e4617-efbb-4bbc-9c61-8535628e439cÙ$6e06bd39-486f-425a-bbca-bf363b58988cÙ$292d9018-b550-4278-a8e0-78dd6a6853f1Ù$047a8881-c2ec-4dd1-8778-e3acf9beba2eÙ$21fbdc3b-4444-4f56-9934-fb58e184d685Ù$c8500b89-644d-407f-881a-bcbd7da23502Ù$6d9ae541-cf8c-4687-9f0a-f008944657e3Ù$cafedde8-be94-4697-a511-510a5fea0155Ù$29b0a2d5-9629-46cd-b57c-6f3ef797de66Ù$01582b3b-c4d0-4691-9edf-f77e6d8be2c9Ù$4862942b-d1e2-4ac8-8e88-65205e91a070Ù$f474fcbd-e3c3-49fd-a6b7-6d6a8a7dda09Ù$2c49900b-3c57-4d9a-b3dc-ef9cc20c30c1Ù$0163763b-a15f-447e-b3d2-32d4bf9d2605Ù$3e367811-247b-4bd6-b8fe-63f8996fb9e8Ù$4c1b286c-2ba9-4293-81e1-bf360baa75faÙ$c5718459-2323-4615-b2c4-f92a0fa189d9Ù$03a06e10-f68a-403c-97bf-7a7627f2c5d6Ù$573a9919-bd7e-4a56-b830-4e40e91288efÙ$bce6e4ab-58ec-4e00-be34-bc4caf51f57dÙ$7d3be915-9092-4261-8435-dd546a7db144Ù$fa04d20f-6e3f-46f8-b3f7-a543d1fa360aÙ$ff5d051e-5de1-48a9-9578-5dbafd71afd1Ù$3f4f078a-9fc4-4b02-b499-a805fd5f1071Ù$2651af2d-56a8-4f7e-a56a-45cabd665c72Ù$e039a5be-4b59-4023-be97-2d1de970be27Ù$223055df-7d5c-4d99-bc8d-fbc9702f906fÙ$926ec37d-b969-4dc9-99b2-a6b29c6d880cÙ$c1d6532c-38a4-488f-9789-07d63fe6f125Ù$84d81413-6334-4965-8632-8a763cd3f28aÙ$4382928c-6325-4ecd-b7cf-282525a270abÙ$8fe856ec-5f0a-4483-bb7d-3f6fe270b6f3Ù$f11dca8f-5557-49fc-9720-35034eadba57Ù$d83ff60f-8973-4dc1-9358-5ad109ea5490Ù$e26f788e-f602-403e-929e-6c98a6e6bf79Ù$c9f7646a-ec01-4d90-9215-5027b7c1c885Ù$0201ae9f-4a31-497e-86ab-62b454ca85deÙ$943b6d7e-14a4-4532-90c7-dd5080be0c6eÙ$0c0b875e-69f8-46ed-ad06-df9c36088fbeÙ$64b210e8-223f-41f7-a6b7-8af6183ddf87Ù$98bec66e-d8f3-4d4d-b4ec-5838489164e5Ù$42799973-9884-4a0e-b29a-039890e92d21Ù$35dc0d94-145a-4292-b0df-9e84a286c036Ù$6029990b-eb31-45ae-a869-b789fba673a6Ù$b37f2395-1480-4c7c-b6c0-eba391e969d7Ù$c306867b-f137-44f2-97dd-3d10c226ca5cÙ$a3d10753-2ec3-4252-9629-834145678b6aÙ$f95ceb98-f12e-4650-9ad3-0609b7ecd0f3Ù$d5b612d8-82a1-4586-b721-1baaea2101cfÙ$f36822d7-9ea8-4f5c-9925-dc2a466a68baÙ$14b456f9-5fd1-4340-a3c7-ab9b91b4e3e0Ù$22c4ce8c-bd82-4eb3-8af5-55342018edffÙ$d7566d1b-8938-4e2c-8c54-124f790e72aeÙ$393cd9d2-dd97-496e-b260-ec6e8b1c13b5Ù$18e60b1d-97ec-432c-a388-003e7fae415fÙ$685a7ba3-0f94-4663-a68a-73fa03bd9445Ù$e947f86e-8dc3-4ce7-a9d4-0a7b675a9fa9Ù$0748902c-ffc0-4634-9a1b-e642b3dfb77bÙ$c4919d14-8cba-43e6-9369-efc52bcb9b23Ù$84a71bf8-0d66-42cd-ac7b-589d63a16edaÙ$6bffb08c-704a-4b7c-bfce-b3d099cf35c0Ù$a4c4d5f2-d76d-425e-b8c9-9047fe53c4f0Ù$84584793-8274-4aa1-854f-b167c7434548Ù$667666b9-3ab6-4836-953d-9878208103c9Ù$3756a3f8-18e8-4d62-afa1-cfeb4183820cÙ$d526a3a4-63cc-4f94-8f55-98c9a4a9d134Ù$69eedbfd-396f-4461-b7a1-c36abc094581Ù$00d67a93-437c-4cda-899a-9daa1102e1f2Ù$b5e06f59-33b5-414e-9a81-43e8abd07aa3Ù$95245673-2c29-401e-bb4b-a39dc8172297Ù$07c57f37-22be-4c39-8279-d80addcea0c5Ù$7ac99619-5232-4db8-8553-d79ea5415d29Ù$8ddf6b9d-d76d-401f-96ad-2a0b5c114fa4Ù$297f1606-4ec2-4075-9f81-926dc517b76fÙ$71774d5f-7841-403f-bc6b-1a0cbbb72d6dÙ$2f4e2da2-b1a1-41b1-8904-39b59f426da4Ù$0e488135-49e5-4e71-83b1-05d8e61f0510Ù$8e15f4b5-0dc7-47a5-9477-9f4d8807b331Ù$dea61907-d4fb-492d-b2bb-c037c7f785cbÙ$8787a5fd-d0ab-46b5-a7df-e7bc103a7378Ù$4019c974-dcaa-46c8-ac90-e6566a376ea1Ù$3134e913-1e86-495d-a558-c3ec4828bf7bÙ$d299d800-a64e-4ba2-9603-efa833343405Ù$04a0be81-ee5f-4eeb-963a-ad930392d50bÙ$f0f9d3d5-e76a-4472-bfb1-da29d73a7916Ù$dee6b500-0ba1-4bbc-b217-cbb9ad47ad06Ù$ed4e863b-22dd-4d2b-88d0-b3a56d6713b7Ù$33d69db9-fa2b-40a3-bbed-21d5fd60f302Ù$e4e80015-40ce-4f8a-aac7-4a9584da4baaÙ$dd167494-99d6-45c6-99e4-c36fde5e2d3fÙ$b3d4117f-7db4-43a6-8427-c08f3542d71fÙ$ad03500a-bd42-4216-a9cb-3f923152af79Ù$7de9b6a4-49ce-4dc3-9d5b-cecfcb98bba1Ù$2455742f-dc18-4d6b-9f58-5666adac6919Ù$c2f56287-9a3e-454a-9ec1-53184b788db9Ù$7ed07ddc-1c63-4ce7-bfd3-6da54304d297Ù$30e663da-282c-42ff-8171-dbe3c5c467c6Ù$bb085f2e-83cb-45b2-adf6-c07da892d6e1Ù$1f28280e-ba3b-4ca5-89e4-6ca4a90f5893´last_hot_reload_timeË©shortpathÙ*Chapter_06_Temporal_Difference_Learning.jl®process_status¥ready¤pathÙµ/home/runner/work/Reinforcement-Learning-Sutton-Barto-Exercise-Solutions/Reinforcement-Learning-Sutton-Barto-Exercise-Solutions/Chapter-06/Chapter_06_Temporal_Difference_Learning.jlpluto_version§v0.20.8®last_save_timeËAÚ•Þ‡Êªcell_orderÜáÙ$814d89be-cfdf-11ec-3295-49a8f302bbcfÙ$495f5606-0567-47ad-a266-d21320eecfc6Ù$410abe1d-04a6-4434-9abf-0d29dd6498e6Ù$7a5ff8f7-70d4-46f1-a4a7-bbfcec4f6e3fÙ$834e5810-77ea-4dfd-9f37-9d9dbf6585a4Ù$3e767962-7339-4f35-a039-b5521a098ed5Ù$8e34202a-f841-4464-9017-cd50194f7987Ù$401831c3-3925-465c-a093-28686f0dad2eÙ$c5d32889-634b-4b00-8ba7-0d1ecaf94f05Ù$24a441c8-7aaf-4642-b245-5e1201456d67Ù$d5abd922-a8c2-4f5c-9a6e-d2490a8ad7dcÙ$bfe71b40-3157-47df-8494-67f8eb8e4e93Ù$7035c082-6e50-4df5-919f-5f09d2011b4aÙ$eb735ead-978b-409c-8990-b5fa7a027ebfÙ$415ea466-2038-48fe-9d24-39a90182f1ebÙ$a0d2333f-e87b-4981-bb52-d436ec6481c1Ù$3b16cbb7-f859-4871-9a63-8b40eb4191beÙ$d4e39164-9833-4deb-84ca-22f49a1c33d8Ù$c93ed1f2-3c38-4f68-8bf8-2cdf4e7bee34Ù$1e3b3234-3fe1-46c9-82b7-f729c656eb25Ù$c09530bc-f37e-4d57-a267-14d4027147daÙ$b5187232-d808-49b6-9f7e-a4cbeb6c2b3eÙ$5f32fed0-c921-4cbb-85fe-ade54d4c6c95Ù$bc8bad61-a49a-47d6-8fa6-7dcf6c221910Ù$6edb550d-5c9f-4ea6-8746-6632806df11eÙ$0f22e85f-ed31-49df-a7c7-0579298f05feÙ$9017093c-a9c3-40ea-a9c6-881ee62fc379Ù$5290ae65-6f56-4849-a842-fe347315c6dcÙ$47c2cbdd-f6db-4ce5-bae2-8141f30aacbcÙ$5455fc97-55cb-4b0e-a3be-9433ccc96fc0Ù$12c5efe4-d64d-4b82-877c-29b0e537fee6Ù$53145cc2-784c-468b-8e91-9bb7866db218Ù$54d97122-2d01-46ec-aafe-00bfc9f2d6d1Ù$e4c6456c-867d-4ade-a3c8-310c1e065f14Ù$9db7a268-1e6d-4366-a0ec-ebf54916d3b0Ù$a9dda9b5-f568-481c-9e8f-9bb887468775Ù$846720cc-550a-4a3c-a80e-40b99671f4e2Ù$4ddcd409-c31c-444c-8fcf-7cc45b68d93bÙ$4b0d96d0-25d1-4fed-b105-c65fa2883a61Ù$64fe8336-d1c2-41fe-a522-1b6f63260fc9Ù$a5009785-64b4-489b-a967-f7840b4a9463Ù$de50f95f-984e-4387-958c-64e0265f5953Ù$f841c4d8-5176-4007-b472-9e01a799d85cÙ$902738c3-2f7b-49cb-8580-29359c857027Ù$889611fb-7dac-4769-9251-9a90e3a1422fÙ$510761f6-66c7-4faf-937b-e1422ec829a6Ù$87fadfc0-2cdb-4be2-81ad-e8fdeffb690cÙ$1dd1ba55-548a-41f6-903e-70742fd60e3dÙ$2786101e-d365-4d6a-8de7-b9794499efb4Ù$0b9c6dbd-4eb3-4167-886e-64db9ec7ff04Ù$52aebb7b-c2a9-443f-bc03-24cd25793b32Ù$e6672866-c0a0-46f2-bb52-25fcc3352645Ù$e8f94345-9ad5-48d4-8709-d796fb55db3fÙ$f2115666-86ce-4c80-9eb7-490cc7a7715cÙ$a72d07bf-e337-4bd4-af5c-44d74d163b6bÙ$c360945e-f8b2-4c6f-a70c-6ab4ddcf5b54Ù$ddf3bb61-16c9-48c4-95d4-263260309762Ù$105c5c23-270d-437e-89dd-12297814c6e0Ù$48b557e3-e239-45e9-ab15-105bcca96492Ù$187fc682-2282-46ca-b988-c9de438f36fdÙ$22c2213e-5b9b-410f-a0ef-8f1e3db3c532Ù$0a4ed8c7-27ca-45cb-af15-70ddd86240fbÙ$620a6426-cb29-4010-997b-aa4f9d5f8fb0Ù$3d8b1ccd-9bb3-42f2-a77a-6afdb72c1ff8Ù$209881b3-3ac8-490e-97bd-fa5ae24a39f5Ù$72b4d8d5-464c-4561-8c69-28ef3f59630bÙ$3f3ebc9b-b070-4d73-8be9-823b399c664cÙ$1e3d231a-4065-48ce-a74e-018066fb232aÙ$0e59e813-3d48-4a24-b5b3-9a9de7c500c2Ù$0d6a11af-b146-4bbc-997e-a11b897269a7Ù$a925534e-f9b8-471a-9d86-c9212129b630Ù$62a9a36a-bedb-4f5a-80a4-2d4111a65c12Ù$b35264b0-ac5b-40ce-95e4-9b2bc4cb106fÙ$4d7619ee-933f-452a-9202-e95a8f3da20fÙ$fe2ebf39-4ab3-4aa8-abbd-23389eaf400eÙ$1ae30f5d-b25b-4dcb-800f-45c463641ec5Ù$6a1503c6-c77b-4e3a-9f07-74b2af1a5ff7Ù$6b496582-cc0e-4195-87ef-94792b0fff54Ù$cb07a6a5-c50a-4900-9e5b-a17dc7ee5710Ù$84a71bf8-0d66-42cd-ac7b-589d63a16edaÙ$4d4577b5-3753-450d-a247-ebd8c3e8f799Ù$12aac612-758b-4655-8ede-daddd4af6d3eÙ$3ed12c33-ab0a-49b1-b9e7-c4305ba35767Ù$61bbf9db-49a0-4709-83f4-44f228be09c0Ù$8d05403a-adeb-40ac-a98a-87586d5a5170Ù$75bfe913-8757-4789-b708-7d400c225218Ù$e19db54c-4b3c-42d1-b016-9620daf89bfbÙ$ec285c96-4a75-4af6-8898-ec3176fa34c6Ù$ab331778-f892-4690-8bb3-26464e3fc05fÙ$500d8dd4-fc53-4021-b797-114224ca4debÙ$136d1d96-b590-4f03-9e42-2337efc560ccÙ$4556cf44-4a1c-4ca4-bfb8-4841301a2ce6Ù$9f28772c-9afe-4253-ab3b-055b0f48be6eÙ$bd1029f9-d6a8-4c68-98cd-8af94297b521Ù$d299d800-a64e-4ba2-9603-efa833343405Ù$04a0be81-ee5f-4eeb-963a-ad930392d50bÙ$0ad739c9-8aca-4b82-bf20-c73584d29535Ù$031e1106-7408-4c7e-b78e-b713c19123d1Ù$cdedd35e-52b8-40a5-938d-2d36f6f93217Ù$9651f823-e1cd-4e6e-9ce0-be9ea1c3f0a4Ù$2155adfa-7a93-4960-950e-1b123da9eea4Ù$d259ecca-0249-4b28-a4d7-6880d4d84495Ù$dda222ef-8178-40bb-bf20-d242924c4fabÙ$f0f9d3d5-e76a-4472-bfb1-da29d73a7916Ù$39470c74-e554-4f6c-919d-97bec1eec0f3Ù$e9359ca3-4d11-4365-bc6e-7babc6fcc7deÙ$dee6b500-0ba1-4bbc-b217-cbb9ad47ad06Ù$db31579e-3e56-4271-8fc3-eb13bc95ac27Ù$b59eacf8-7f78-4015-bf2c-66f89bf0e24eÙ$02f34da1-551f-4ce5-a588-7f3a14afd716Ù$aa0791a5-8cf1-499b-9900-4d0c59be808cÙ$4ddc7d99-0b79-4689-bd93-8798b105c0a2Ù$ed4e863b-22dd-4d2b-88d0-b3a56d6713b7Ù$2d881aa9-1da3-4d1e-8d05-245956dbaf33Ù$8bc54c94-9c92-4904-b3a6-13ff3f0110bbÙ$678cad7a-1abb-4fcc-91ba-b5abcbb914cbÙ$9da5fd84-800d-4b3e-8627-e90ce8f20297Ù$44c49006-e210-4f97-916e-fe62f36c593fÙ$2034fd1e-5171-4eda-85d5-2de62d7a1e8bÙ$c34678f6-53bb-4f2a-96f0-a7b16f894dddÙ$9d01c0ef-6313-4091-b444-3e9765aba90cÙ$4b1a4c14-3c2b-40c0-995c-cd0334ed8b3aÙ$897fde24-9a4a-465e-96f2-dd9e8baab294Ù$f2776908-d06a-4073-b2ce-ecbf109c9cc7Ù$1115f3ec-f4b2-4fba-bd5e-321a63b10a6dÙ$c4719c42-87aa-482a-95aa-a1492d42835dÙ$1e45a661-c2e1-40c2-b27b-5f80f95efdabÙ$8224b808-5778-458b-b683-ea2603c82117Ù$6556dafb-04fa-434c-868a-8d7bb7b5b196Ù$6faa3015-3ac4-44af-a78c-10b175822441Ù$6bffb08c-704a-4b7c-bfce-b3d099cf35c0Ù$a4c4d5f2-d76d-425e-b8c9-9047fe53c4f0Ù$05664aaf-575b-4249-974c-d8a2e63f380aÙ$2a3e4617-efbb-4bbc-9c61-8535628e439cÙ$6e06bd39-486f-425a-bbca-bf363b58988cÙ$292d9018-b550-4278-a8e0-78dd6a6853f1Ù$047a8881-c2ec-4dd1-8778-e3acf9beba2eÙ$667666b9-3ab6-4836-953d-9878208103c9Ù$21fbdc3b-4444-4f56-9934-fb58e184d685Ù$cafedde8-be94-4697-a511-510a5fea0155Ù$c8500b89-644d-407f-881a-bcbd7da23502Ù$84584793-8274-4aa1-854f-b167c7434548Ù$6d9ae541-cf8c-4687-9f0a-f008944657e3Ù$29b0a2d5-9629-46cd-b57c-6f3ef797de66Ù$01582b3b-c4d0-4691-9edf-f77e6d8be2c9Ù$4862942b-d1e2-4ac8-8e88-65205e91a070Ù$ff5d051e-5de1-48a9-9578-5dbafd71afd1Ù$f474fcbd-e3c3-49fd-a6b7-6d6a8a7dda09Ù$2c49900b-3c57-4d9a-b3dc-ef9cc20c30c1Ù$0163763b-a15f-447e-b3d2-32d4bf9d2605Ù$2651af2d-56a8-4f7e-a56a-45cabd665c72Ù$3e367811-247b-4bd6-b8fe-63f8996fb9e8Ù$4c1b286c-2ba9-4293-81e1-bf360baa75faÙ$c5718459-2323-4615-b2c4-f92a0fa189d9Ù$03a06e10-f68a-403c-97bf-7a7627f2c5d6Ù$573a9919-bd7e-4a56-b830-4e40e91288efÙ$bce6e4ab-58ec-4e00-be34-bc4caf51f57dÙ$7d3be915-9092-4261-8435-dd546a7db144Ù$fa04d20f-6e3f-46f8-b3f7-a543d1fa360aÙ$3f4f078a-9fc4-4b02-b499-a805fd5f1071Ù$e039a5be-4b59-4023-be97-2d1de970be27Ù$3756a3f8-18e8-4d62-afa1-cfeb4183820cÙ$d526a3a4-63cc-4f94-8f55-98c9a4a9d134Ù$223055df-7d5c-4d99-bc8d-fbc9702f906fÙ$926ec37d-b969-4dc9-99b2-a6b29c6d880cÙ$c1d6532c-38a4-488f-9789-07d63fe6f125Ù$00d67a93-437c-4cda-899a-9daa1102e1f2Ù$84d81413-6334-4965-8632-8a763cd3f28aÙ$4382928c-6325-4ecd-b7cf-282525a270abÙ$69eedbfd-396f-4461-b7a1-c36abc094581Ù$8fe856ec-5f0a-4483-bb7d-3f6fe270b6f3Ù$f11dca8f-5557-49fc-9720-35034eadba57Ù$d83ff60f-8973-4dc1-9358-5ad109ea5490Ù$e4e80015-40ce-4f8a-aac7-4a9584da4baaÙ$e26f788e-f602-403e-929e-6c98a6e6bf79Ù$c9f7646a-ec01-4d90-9215-5027b7c1c885Ù$b5e06f59-33b5-414e-9a81-43e8abd07aa3Ù$0201ae9f-4a31-497e-86ab-62b454ca85deÙ$943b6d7e-14a4-4532-90c7-dd5080be0c6eÙ$0c0b875e-69f8-46ed-ad06-df9c36088fbeÙ$64b210e8-223f-41f7-a6b7-8af6183ddf87Ù$98bec66e-d8f3-4d4d-b4ec-5838489164e5Ù$297f1606-4ec2-4075-9f81-926dc517b76fÙ$33d69db9-fa2b-40a3-bbed-21d5fd60f302Ù$42799973-9884-4a0e-b29a-039890e92d21Ù$35dc0d94-145a-4292-b0df-9e84a286c036Ù$6029990b-eb31-45ae-a869-b789fba673a6Ù$b37f2395-1480-4c7c-b6c0-eba391e969d7Ù$c306867b-f137-44f2-97dd-3d10c226ca5cÙ$a3d10753-2ec3-4252-9629-834145678b6aÙ$393cd9d2-dd97-496e-b260-ec6e8b1c13b5Ù$18e60b1d-97ec-432c-a388-003e7fae415fÙ$685a7ba3-0f94-4663-a68a-73fa03bd9445Ù$e947f86e-8dc3-4ce7-a9d4-0a7b675a9fa9Ù$f95ceb98-f12e-4650-9ad3-0609b7ecd0f3Ù$ad03500a-bd42-4216-a9cb-3f923152af79Ù$c2f56287-9a3e-454a-9ec1-53184b788db9Ù$7de9b6a4-49ce-4dc3-9d5b-cecfcb98bba1Ù$bb085f2e-83cb-45b2-adf6-c07da892d6e1Ù$1f28280e-ba3b-4ca5-89e4-6ca4a90f5893Ù$d5b612d8-82a1-4586-b721-1baaea2101cfÙ$f36822d7-9ea8-4f5c-9925-dc2a466a68baÙ$639840dc-976a-4e5c-987f-a92afb2d99d8Ù$14b456f9-5fd1-4340-a3c7-ab9b91b4e3e0Ù$22c4ce8c-bd82-4eb3-8af5-55342018edffÙ$d7566d1b-8938-4e2c-8c54-124f790e72aeÙ$0748902c-ffc0-4634-9a1b-e642b3dfb77bÙ$c4919d14-8cba-43e6-9369-efc52bcb9b23Ù$95245673-2c29-401e-bb4b-a39dc8172297Ù$07c57f37-22be-4c39-8279-d80addcea0c5Ù$7ac99619-5232-4db8-8553-d79ea5415d29Ù$8ddf6b9d-d76d-401f-96ad-2a0b5c114fa4Ù$71774d5f-7841-403f-bc6b-1a0cbbb72d6dÙ$2f4e2da2-b1a1-41b1-8904-39b59f426da4Ù$0e488135-49e5-4e71-83b1-05d8e61f0510Ù$8e15f4b5-0dc7-47a5-9477-9f4d8807b331Ù$dea61907-d4fb-492d-b2bb-c037c7f785cbÙ$8787a5fd-d0ab-46b5-a7df-e7bc103a7378Ù$4019c974-dcaa-46c8-ac90-e6566a376ea1Ù$3134e913-1e86-495d-a558-c3ec4828bf7bÙ$dd167494-99d6-45c6-99e4-c36fde5e2d3fÙ$b3d4117f-7db4-43a6-8427-c08f3542d71fÙ$2455742f-dc18-4d6b-9f58-5666adac6919Ù$30e663da-282c-42ff-8171-dbe3c5c467c6Ù$7ed07ddc-1c63-4ce7-bfd3-6da54304d297±published_objectsÞ6Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/72ba1d0790a4c524„¦layoutŠ¨autosizeÂ§paddingÊ¥xaxis‹¨showlineÃ©gridcolor¥black¨tickvalsÇ€?@@@¥range×€?€@¨ticktextÇ(€?€?€?@@€?©linecolor¥black¨showgridÃ¨gridwithÊ?€¨zerolineÃ¥title«Wind Values¦mirrorÃpaper_bgcolor°rgba(0, 0, 0, 0)¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦heightÊCH¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥yaxis‰¨showgridÃ¨showlineÃ©gridcolor¥black©gridwidthÊ?€¨tickvalsÇ€?@@@¥range×€?€@¦mirrorÃ¨ticktext“   ©linecolor¥black¥titleƒ¤font¤sizeÊA`¤textÙ Optimal policy
path example¡xÊ?¥widthÊCH¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤data–‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖÀ?¤type§scatter¤text‘¡S¡xÖÀ?‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖ`@¤type§scatter¤text‘¡G¡xÖ`@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À? @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x× @`@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x×`@`@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @`@¤type§scatter¤name¬Optimal Path¡x×`@`@Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/d6339d133c128c5b„¦layoutŠ¨autosizeÂ§paddingÊ¥xaxis‹¨showlineÃ©gridcolor¥black¨tickvalsÇ(€?@@@€@ @À@à@AA A¥range×€?0A¨ticktextÇ(€?€?€?@@€?©linecolor¥black¨showgridÃ¨gridwithÊ?€¨zerolineÃ¥title«Wind Values¦mirrorÃpaper_bgcolor°rgba(0, 0, 0, 0)¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦heightÊCR¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥yaxis‰¨showgridÃ¨showlineÃ©gridcolor¥black©gridwidthÊ?€¨tickvalsÇ€?@@@€@ @À@à@¥range×€?A¦mirrorÃ¨ticktext—       ©linecolor¥black¥titleƒ¤font¤sizeÊA`¤textÙ Optimal policy
path example¡xÊ?¥widthÊC–¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤dataÜ‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖ@¤type§scatter¤text‘¡S¡xÖÀ?‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖ@¤type§scatter¤text‘¡G¡xÖA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@@¤type§scatter¤name¬Optimal Path¡x×À? @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@@¤type§scatter¤name¬Optimal Path¡x× @`@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@@¤type§scatter¤name¬Optimal Path¡x×`@@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@°@¤type§scatter¤name¬Optimal Path¡x×@°@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×°@Ð@¤type§scatter¤name¬Optimal Path¡x×°@Ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×Ð@ð@¤type§scatter¤name¬Optimal Path¡x×Ð@ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×ð@ð@¤type§scatter¤name¬Optimal Path¡x×ð@Ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×ð@ð@¤type§scatter¤name¬Optimal Path¡x×Ð@ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×ð@ð@¤type§scatter¤name¬Optimal Path¡x×ð@A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×ð@ð@¤type§scatter¤name¬Optimal Path¡x×AA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×ð@ð@¤type§scatter¤name¬Optimal Path¡x×A(A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×ð@Ð@¤type§scatter¤name¬Optimal Path¡x×(A(A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×Ð@°@¤type§scatter¤name¬Optimal Path¡x×(A(A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×°@Ð@¤type§scatter¤name¬Optimal Path¡x×(A(A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×Ð@°@¤type§scatter¤name¬Optimal Path¡x×(A(A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×°@@¤type§scatter¤name¬Optimal Path¡x×(A(A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@`@¤type§scatter¤name¬Optimal Path¡x×(A(A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x×(AA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@@¤type§scatter¤name¬Optimal Path¡x×AAÙ59c6be96e-38f7-11f0-2d30-a71f02755abc/93bf178085e446c5„¦layoutŠ¨autosizeÂ§paddingÊ¥xaxis‹¨showlineÃ©gridcolor¥black¨tickvalsÇ€?@@@¥range×€?€@¨ticktextÇ(€?€?€?@@€?©linecolor¥black¨showgridÃ¨gridwithÊ?€¨zerolineÃ¥title«Wind Values¦mirrorÃpaper_bgcolor°rgba(0, 0, 0, 0)¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦heightÊCH¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥yaxis‰¨showgridÃ¨showlineÃ©gridcolor¥black©gridwidthÊ?€¨tickvalsÇ€?@@@¥range×€?€@¦mirrorÃ¨ticktext“   ©linecolor¥black¥titleƒ¤font¤sizeÊA`¤textÙ Optimal policy
path example¡xÊ?¥widthÊCH¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤data–‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖÀ?¤type§scatter¤text‘¡S¡xÖÀ?‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖ`@¤type§scatter¤text‘¡G¡xÖ`@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @`@¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x×À? @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x× @`@Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/5b7c97cc5c268b2e„¦layoutŠ¨autosizeÂ§paddingÊ¥xaxis‹¨showlineÃ©gridcolor¥black¨tickvalsÇ0€?@@@€@ @À@à@AA A0A@A¥range×€?PA¨ticktextœ            ©linecolor¥black¨showgridÃ¨gridwithÊ?€¨zerolineÃ¥title ¦mirrorÃpaper_bgcolor°rgba(0, 0, 0, 0)¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦heightÊCH¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥yaxis‰¨showgridÃ¨showlineÃ©gridcolor¥black©gridwidthÊ?€¨tickvalsØ€?@@@€@¥range×€? @¦mirrorÃ¨ticktext”    ©linecolor¥black¥titleƒ¤font¤sizeÊA`¤text¸Cliff Walking Sarsa Path¡xÊ?¥widthÊC´¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤dataÜ‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖÀ?¤type§scatter¤text‘¡S¡xÖÀ?‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖÀ?¤type§scatter¤text‘¡G¡xÖHA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @`@¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@@¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@@¤type§scatter¤name¬Optimal Path¡x×À? @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@@¤type§scatter¤name¬Optimal Path¡x× @`@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@@¤type§scatter¤name¬Optimal Path¡x×`@@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@@¤type§scatter¤name¬Optimal Path¡x×@°@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@@¤type§scatter¤name¬Optimal Path¡x×°@Ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@@¤type§scatter¤name¬Optimal Path¡x×Ð@ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@@¤type§scatter¤name¬Optimal Path¡x×ð@A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@@¤type§scatter¤name¬Optimal Path¡x×AA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@@¤type§scatter¤name¬Optimal Path¡x×A(A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@@¤type§scatter¤name¬Optimal Path¡x×(A8A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@@¤type§scatter¤name¬Optimal Path¡x×8AHA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@`@¤type§scatter¤name¬Optimal Path¡x×HAHA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@ @¤type§scatter¤name¬Optimal Path¡x×HAHA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x×HAHAÙ59c6be96e-38f7-11f0-2d30-a71f02755abc/6021fa627daa4cd3„¦layout„¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¤axis¥title¢Î±¥yaxis‚¥title¤textºSum of rewards per episode¥range×Ã¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤data–†¤modelines+markers¤line¤dash¤dash¡yÇLú½ÃÚ%ØÂçË·Â”Î¡Â—æ‘Â/†Â–wzÂªlÂ»aÂ‘qXÂFQÂ¯ÉJÂþTEÂO¾@Âõ³<Âût9Â…i6Â1b3ÂF!1Â¤type§scatter¤name·Intermim Expected Sarsa¡xÇLÍÌÌ=š™>ÍÌL>€>š™™>33³>ÍÌÌ>ffæ>?ÍÌ?š™?ff&?333?@?ÍÌL?š™Y?fff?33s?€?†¤modelines+markers¤line¤dash¤dash¡yÇLÐIÃÁÜåÂelÈÂ;_µÂH,¨ÂöÙžÂH˜ÂÛb“Â8äÂ¦IŽÂ°.Â¼yŒÂPbŒÂÂdŒÂ&wŒÂ—ŒÂ~ŒÂŸpŒÂ³èªÂ¤type§scatter¤name®Intermim Sarsa¡xÇLÍÌÌ=š™>ÍÌL>€>š™™>33³>ÍÌÌ>ffæ>?ÍÌ?š™?ff&?333?@?ÍÌL?š™Y?fff?33s?€?†¤modelines+markers¤line¤dash¤dash¡yÇLWîÃ¢üÂ€âÝÂãeÊÂ©@½ÂG@´Â_tÂw¨ÂÏ£¤Â«q Â´ÂX$›ÂT7™Âðš—ÂLb–Â/€•ÂBU•Âí•Â³P§Â¤type§scatter¤name³Intermim Q-learning¡xÇLÍÌÌ=š™>ÍÌL>€>š™™>33³>ÍÌÌ>ffæ>?ÍÌ?š™?ff&?333?@?ÍÌL?š™Y?fff?33s?€?†¤modelines+markers¤line¤dash£dot¡yÇLù¦ÁFv¦Á¤Z¦Á ¦ÁÜl¦Á¦Á‘2¦Á~Ó¥Á(8¦ÁqÑ¥Á6J¦Á¢Ú¥ÁÚž¥Á••¥ÁAÓ¥ÁŽî¥ÁÍ¼¥Ážã¥Á%å¥Á¤type§scatter¤name¹Asymptotic Expected Sarsa¡xÇLÍÌÌ=š™>ÍÌL>€>š™™>33³>ÍÌÌ>ffæ>?ÍÌ?š™?ff&?333?@?ÍÌL?š™Y?fff?33s?€?†¤modelines+markers¤line¤dash£dot¡yÇLœŒ¯Á@í²Á›í´Á¿¸Á×»ÁdnÀÁ¢žÅÁ`÷ËÁÊt×ÁKæÁ²ðøÁòÂv×Âdº1ÂÛQÂØ€Â…¾¨Â±}ÃxÌƒÄ¤type§scatter¤name°Asymptotic Sarsa¡xÇLÍÌÌ=š™>ÍÌL>€>š™™>33³>ÍÌÌ>ffæ>?ÍÌ?š™?ff&?333?@?ÍÌL?š™Y?fff?33s?€?†¤modelines+markers¤line¤dash£dot¡yÇLDLÂÎsKÂKÂ–KÂjKÂÃ:KÂê•KÂÄÈKÂ:FKÂeÛKÂ†·KÂ ËKÂ]LÂý·KÂ€üKÂRKÂ|òKÂæÉKÂÞšKÂ¤type§scatter¤nameµAsymptotic Q-learning¡xÇLÍÌÌ=š™>ÍÌL>€>š™™>33³>ÍÌÌ>ffæ>?ÍÌ?š™?ff&?333?@?ÍÌL?š™Y?fff?33s?€?Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/d3030aa42e1dd0c8„¦layout…¥xaxis¥title¤text°Walks / Episodes¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥yaxis¥title¤text¿RMS error, averaged over states¥title®Batch Training¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤data’„¡yÈ”ï[q>×Òn>k>ß6f>²`>ê†Y>° Q>šI>NA>Ó&9>ö²1>cñ)>X#>÷M>u¿>r>VŸ>ôê>:>åKú=…º÷=Pó=^í=@4é=QÑå=Âß=›HÙ=ƒÄÓ=e¢Î=ã Ê=ÝÄ=¹Â=]•¾=^¼=œõ»=,ž»=ÛBº=*¨½=iÔÀ=ÂÁ¾=Rn¾=Ìå¼=º=úñ¶=w‹µ=‰³=ª±²=XN±=è:¯=bX=UŠ«=]Ö©=¤ï«=3=¨=Šâ¦=‘¹¥=ù =)-Ÿ=âåž=ìœ=ÙÐ›=µíš=¶™=€h—=«Ç—=«µ–=ñ£•=)P‘=ç’=Î“=„‡”=U•=êÀ’=_‘=ù€Œ=Ä°Œ=DÔ‹=ðý‰=Ê*‹=Þ‚Š=¡×Œ=ßG‹=a1Œ=0ª‰=EV†=Ž‡=Â„†=<–‡=Áp‰=ÖvŠ=bˆ=êj†=”$‡=Ã=‡=6ˆ=Ãì†=Æ_ˆ=mŠ=îvŽ=ð4•=ôé“=¤type§scatter¤name¢MC¡xÈ”€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A¨A°A¸AÀAÈAÐAØAàAèAðAøABBBBBBBB B$B(B,B0B4B8Bâêo>€m>½àh>Nnc>žÒ\>Œ+U>i˜L>:;C>ë39>Š².>ÇÙ#>ÂÑ>8Æ >à>¤ð=È5Ü=¶É=åx·=)Œ§="‚™=x=BMƒ=ç[u=]gh=8½^=èX=++T=Š_R=[R=DT=>ÅU=X=\Z=•W\=g¹^=ez`= \b=5d=ße=J¹e=ce=ùŒd=[c=ãÂa=ÒZ_=Î]=Î[=ë’Z=Ð:Y=UW=Æ|U=ŠjT=S=ð¶Q=ÜéO=égN=,ñL=fLJ=¿íG=§‹E=íB=Øs@=³>=Ë’==ŸO<=Õé;=r:;=*:=RM9=^8=³$7=°6=|’6=Ã6=Ás5=#5=RÂ3=›3=ï2=¸â0=H70=‹/=‡ë-=Ó,=õ+=Àê*=ˆ)=\'=b¾%=A|%=â%= ã$=àÔ$=€m$=¥‘#= ú"=Ìý!=”£ =|ó=ö=¤type§scatter¤name¢TD¡xÈ”€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A¨A°A¸AÀAÈAÐAØAàAèAðAøABBBBBBBB B$B(B,B0B4B8B¨ticktext–®leaving office©reach car¯exiting highway«2ndary road«home street«arrive home¦anchor¢y1¦yaxis1ƒ¥title¤textÙ Predicted total
travel time¦domain×€?¦anchor¢x1¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦xaxis2…¨tickvals–§leaving©reach_car¬exit_highway¦snd_rd§home_st¦arrive¥title¤text¥State¦domain×ÍÌ?€?¨ticktext–®leaving office©reach car¯exiting highway«2ndary road«home street«arrive home¦anchor¢y2¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¦yaxis2‚¦domain×€?¦anchor¢x2¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤dataÜ‡¤line¥color¥black¥xaxis¢x1¡yÇðAàAàAàAàAðA¤type§scatter¤name®actual outcome¥yaxis¢y1¡x–§leaving©reach_car¬exit_highway¦snd_rd§home_st¦arriveˆ¤mode¥lines¤line‚¥color¥black¤dash¤dash¥xaxis¢x1¡yÇðAðAðAðAðAðA¤type§scatter¤name¶Monte Carlo Prediction¥yaxis¢y1¡x–§leaving©reach_car¬exit_highway¦snd_rd§home_st¦arrive‰ªshowlegendÂ¡x’§leaving§leaving¤line¥color£red¥xaxis¢x1¡y×ðAðA¤type§scatter¤name°Mone Carlo Error¥yaxis¢y1¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’©reach_car©reach_car¤line¥color£red¥xaxis¢x1¡y×àAðA¤type§scatter¤name°Mone Carlo Error¥yaxis¢y1¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’¬exit_highway¬exit_highway¤line¥color£red¥xaxis¢x1¡y×àAðA¤type§scatter¤name°Mone Carlo Error¥yaxis¢y1¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’¦snd_rd¦snd_rd¤line¥color£red¥xaxis¢x1¡y×àAðA¤type§scatter¤name°Mone Carlo Error¥yaxis¢y1¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’§home_st§home_st¤line¥color£red¥xaxis¢x1¡y×àAðA¤type§scatter¤name°Mone Carlo Error¥yaxis¢y1¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’¦arrive¦arrive¤line¥color£red¥xaxis¢x1¡y×ðAðA¤type§scatter¤name°Mone Carlo Error¥yaxis¢y1¦marker‚¦symbol¬arrow-bar-up¨angleref¨previousˆªshowlegendÂ¤line¥color¥black¥xaxis¢x2¡yÇðAàAàAàAàAðA¤type§scatter¤name®actual outcome¥yaxis¢y2¡x–§leaving©reach_car¬exit_highway¦snd_rd§home_st¦arriveˆ¤mode¥lines¤lineƒ¥color¥black¤dash¤dash¥shape¢hv¥xaxis¢x2¡yÇàAàAàAàAðAðA¤type§scatter¤name°TD(0) Prediction¥yaxis¢y2¡x–§leaving©reach_car¬exit_highway¦snd_rd§home_st¦arrive‰ªshowlegendÂ¡x’§leaving§leaving¤line¥color£red¥xaxis¢x2¡y×ðAàA¤type§scatter¤name«TD(0) Error¥yaxis¢y2¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’©reach_car©reach_car¤line¥color£red¥xaxis¢x2¡y×àAàA¤type§scatter¤name«TD(0) Error¥yaxis¢y2¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’¬exit_highway¬exit_highway¤line¥color£red¥xaxis¢x2¡y×àAàA¤type§scatter¤name«TD(0) Error¥yaxis¢y2¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’¦snd_rd¦snd_rd¤line¥color£red¥xaxis¢x2¡y×àAàA¤type§scatter¤name«TD(0) Error¥yaxis¢y2¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’§home_st§home_st¤line¥color£red¥xaxis¢x2¡y×àAðA¤type§scatter¤name«TD(0) Error¥yaxis¢y2¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’¦arrive¦arrive¤line¥color£red¥xaxis¢x2¡y×ðAðA¤type§scatter¤name«TD(0) Error¥yaxis¢y2¦marker‚¦symbol¬arrow-bar-up¨angleref¨previousÙ59c6be96e-38f7-11f0-2d30-a71f02755abc/a7c05c6ee7bae052„¦layoutŠ¨autosizeÂ§paddingÊ¥xaxis‹¨showlineÃ©gridcolor¥black¨tickvalsÇ(€?@@@€@ @À@à@AA A¥range×€?0A¨ticktextÇ(€?€?€?@@€?©linecolor¥black¨showgridÃ¨gridwithÊ?€¨zerolineÃ¥title«Wind Values¦mirrorÃpaper_bgcolor°rgba(0, 0, 0, 0)¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦heightÊCR¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥yaxis‰¨showgridÃ¨showlineÃ©gridcolor¥black©gridwidthÊ?€¨tickvalsÇ€?@@@€@ @À@à@¥range×€?A¦mirrorÃ¨ticktext—       ©linecolor¥black¥titleƒ¤font¤sizeÊA`¤textÙ(Value Iteration Policy
Path Example¡xÊ?¥widthÊC–¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤data™‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖ@¤type§scatter¤text‘¡S¡xÖÀ?‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖ@¤type§scatter¤text‘¡G¡xÖA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@`@¤type§scatter¤name¬Optimal Path¡x×À? @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x× @`@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x×`@@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x×@°@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x×°@Ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x×Ð@ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@@¤type§scatter¤name¬Optimal Path¡x×ð@AÙ59c6be96e-38f7-11f0-2d30-a71f02755abc/76a25ffbba40a531„¦layout„¥xaxis¥title¤textªTime steps¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥yaxis‚¤type£log¥title¤text±Steps Per Episode¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤data‘ƒ¡yÈ@€{D€âC€¨C‚B€¸C8B¸BC€«CœBúB®B?ChBºBèAB€—C%CTB CDBB`A CJCðApB`BCÐB€ÇCžBÌB@AŒCÀD CòB˜A(BÐBBÀA{CBB®B|BÔBÀAªBB0A:C˜AàAŠBBB$BÀA/CpB|B$BðA¨APBØAÀA}C|B!CØA€AðAŒB¾BàAhB,B¼B„B8B„C AàBBlB¦BÚBC0B{C\B`BBBBwC¸B¢BpBA°A AðApA A¨A Aà@B@A`A¶BA˜AAèA$B@AØAøA€BHBA|Bà@0A¨A¼BŒBC@BAbCAClB0ACA~CB4C@B1CPCBTBCöB(BðA˜AÔBC?C@CACBCCCDCECFCGCHCICJCKCLCMCNCOCPCQCRCSCTCUCVCWCXCYCZC[C\C]C^C_C`CaCbCcCdCeCfCgChCiCjCkClCmCnCoCpCqCrCsCtCuCvCwCxCyCzC{C|C}C~CC€C€€CC€C‚C€‚CƒC€ƒC„C€„C…C€…C†C€†C‡C€‡CˆC€ˆC‰C€‰CŠC€ŠC‹C€‹CŒC€ŒCC€CŽC€ŽCC€CC€C‘C€‘C’C€’C“C€“C”C€”C•C€•C–C€–C—C€—C˜C€˜C™C€™CšC€šC›C€›CœC€œCC€CžC€žCŸC€ŸC C€ C¡C€¡C¢C€¢C£C€£C¤C€¤C¥C€¥C¦C€¦C§C€§C¨C€¨C©C€©CªC€ªC«C€«C¬C€¬CC€C®C€®C¯C€¯C°C€°C±C€±C²C€²C³C€³C´C€´CµC€µC¶C€¶C·C€·C¸C€¸C¹C€¹CºC€ºC»C€»C¼C€¼C½C€½C¾C€¾C¿C€¿CÀC€ÀCÁC€ÁCÂC€ÂCÃC€ÃCÄC€ÄCÅC€ÅCÆC€ÆCÇC€ÇCÈCÙ59c6be96e-38f7-11f0-2d30-a71f02755abc/59425f0a62718546„¦layout„¥xaxis¥title¤textªTime steps¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥yaxis¥title¤text¨Episodes¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤data‘„¤line¥color£red¡yÈ@€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A¨A°A¸AÀAÈAÐAØAàAèAðAøABBBBBBBB B$B(B,B0B4B8BC?C@CACBCCCDCECFCGCHCICJCKCLCMCNCOCPCQCRCSCTCUCVCWCXCYCZC[C\C]C^C_C`CaCbCcCdCeCfCgChCiCjCkClCmCnCoCpCqCrCsCtCuCvCwCxCyCzC{C|C}C~CC€C€€CC€C‚C€‚CƒC€ƒC„C€„C…C€…C†C€†C‡C€‡CˆC€ˆC‰C€‰CŠC€ŠC‹C€‹CŒC€ŒCC€CŽC€ŽCC€CC€C‘C€‘C’C€’C“C€“C”C€”C•C€•C–C€–C—C€—C˜C€˜C™C€™CšC€šC›C€›CœC€œCC€CžC€žCŸC€ŸC C€ C¡C€¡C¢C€¢C£C€£C¤C€¤C¥C€¥C¦C€¦C§C€§C¨C€¨C©C€©CªC€ªC«C€«C¬C€¬CC€C®C€®C¯C€¯C°C€°C±C€±C²C€²C³C€³C´C€´CµC€µC¶C€¶C·C€·C¸C€¸C¹C€¹CºC€ºC»C€»C¼C€¼C½C€½C¾C€¾C¿C€¿CÀC€ÀCÁC€ÁCÂC€ÂCÃC€ÃCÄC€ÄCÅC€ÅCÆC€ÆCÇC€ÇCÈC¤type§scatter¡xÈ@à—D€ËD@ÝDôD0E EðE0E 3EP5E€;EP>E€@E°CE°IENEpVE`XEP[E@^E€`EbEðeEiE`mE€pE€wEðyE`}EX€Eè‚E0„E…Eh†E ˆEŠEè‹E8ŒEàEàEØ‘E“E —EpšEÀšEø›EhœEhExžE0 EÐ EÈ¡EÀ£E0¤EÐ¤Ep¥EÀ¥E¦E`¦EÀ¦E8©EÐ©E¸ªE0«Eˆ«Eð«EEèE(¯Eà±Eˆ²E³EX´EhµEÈµEØ·E€¸E@¹E˜¹EØ¹E°ºEx»E¸»E(¼E¸¼Eh½E¾EX¾E ¾E¸¿E ÀE ÀEPÁEØÁEhÂEØÂEÃEPÄE8ÆE€ÆEÇEPÇE`ÈEèÈEøÉEhÊE¸ÊEèËEøÌEèÍExÐEHÒEÒEÓEpÓE°ÓEÔEHÔEØÔEPÕEðÕEØÖE(×EÈ×E8ÙEˆÙExÚEèÚExÛEpÜEÝEðÝE`ÞE¸ÞE ßEˆßE0àEÐàEâEøâE ãEäE@äEäEÐäEåE@åExåEæEXæE¨æEçEXçE˜çEðçEˆèEÐèEPéE˜éEÐéEêE¨êEèêE ëEhëE ëEàëEìEPìE¨ìEðìE(íEpíEîEHîE€îE€ïEðE˜ðE ñEøñE€òEàòE(óEÀóEöE@öE€öE¸öEà÷EøEPøE¸øEùE˜ùEÐùEúE@úExúE°úEØûE`ýEÐýEþEPþEˆþEHÿE€ÿE¸ÿEðÿEF0FLFhF¬FÐFF$FDF`F|FàFüFFLFlF¤FÐFìFtF”F´FÐFìF@F FÔFF<F`F|FŒF¨FF0FPFˆF¤FÄFFPFlFØFôF FL Fˆ FÀ F F8 Fx Fœ FØ Fø F<FXF˜F´FÐFF8F\FœFÀFÜF< Fl FŒ FÐ Fì F(FHF F4FèFF8FTFtFœF¸FF FDF`FÀFÜFøFF4FXFœF¸FÔFFTFpFFäF,F|F FÀFFF`F€FF$F€FÔFøFF0F¬FÌFðF8F°FØF FðFF(FŒF¨FÈFèFF,FLFhF”F´FÔFF¤FÄFìFF8F\FàF FHFœFôFXF¸FÜF FT Fx F´ Fð Fh!F„!F´!Fô!FH"Fx"FÀ"FÜ"Fø"F#FL#Ft#F¤#FÀ#F($FX$F¸$Fä$F@%F„%F¬%FÜ%Ft&F &F¼&FÜ&F'F,'FL'FÈ'Fè'F(F@(FŒ(F¸(FÙ59c6be96e-38f7-11f0-2d30-a71f02755abc/4cf46394be540b73„¦layoutŠ¨autosizeÂ§paddingÊ¥xaxis‹¨showlineÃ©gridcolor¥black¨tickvalsÇ€?@@@¥range×€?€@¨ticktextÇ(€?€?€?@@€?©linecolor¥black¨showgridÃ¨gridwithÊ?€¨zerolineÃ¥title«Wind Values¦mirrorÃpaper_bgcolor°rgba(0, 0, 0, 0)¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦heightÊCH¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥yaxis‰¨showgridÃ¨showlineÃ©gridcolor¥black©gridwidthÊ?€¨tickvalsÇ€?@@@¥range×€?€@¦mirrorÃ¨ticktext“   ©linecolor¥black¥titleƒ¤font¤sizeÊA`¤textÙ Optimal policy
path example¡xÊ?¥widthÊCH¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤data™‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖÀ?¤type§scatter¤text‘¡S¡xÖÀ?‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖ`@¤type§scatter¤text‘¡G¡xÖ`@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x×À? @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x× @`@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @`@¤type§scatter¤name¬Optimal Path¡x×`@`@Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/a0944b0f6ba4cc1f„¦layoutŠ¨autosizeÂ§paddingÊ¥xaxis‹¨showlineÃ©gridcolor¥black¨tickvalsÇ0€?@@@€@ @À@à@AA A0A@A¥range×€?PA¨ticktextœ            ©linecolor¥black¨showgridÃ¨gridwithÊ?€¨zerolineÃ¥title ¦mirrorÃpaper_bgcolor°rgba(0, 0, 0, 0)¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦heightÊCH¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥yaxis‰¨showgridÃ¨showlineÃ©gridcolor¥black©gridwidthÊ?€¨tickvalsØ€?@@@€@¥range×€? @¦mirrorÃ¨ticktext”    ©linecolor¥black¥titleƒ¤font¤sizeÊA`¤text¸Cliff Walking Sarsa Path¡xÊ?¥widthÊC´¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤dataÜ‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖÀ?¤type§scatter¤text‘¡S¡xÖÀ?‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖÀ?¤type§scatter¤text‘¡G¡xÖHA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @`@¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@@¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@@¤type§scatter¤name¬Optimal Path¡x×À? @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@@¤type§scatter¤name¬Optimal Path¡x× @`@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@@¤type§scatter¤name¬Optimal Path¡x×`@@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@@¤type§scatter¤name¬Optimal Path¡x×@°@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@@¤type§scatter¤name¬Optimal Path¡x×°@Ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@@¤type§scatter¤name¬Optimal Path¡x×Ð@ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@@¤type§scatter¤name¬Optimal Path¡x×ð@A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@@¤type§scatter¤name¬Optimal Path¡x×AA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@@¤type§scatter¤name¬Optimal Path¡x×A(A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@@¤type§scatter¤name¬Optimal Path¡x×(A8A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@@¤type§scatter¤name¬Optimal Path¡x×8AHA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@`@¤type§scatter¤name¬Optimal Path¡x×HAHA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@ @¤type§scatter¤name¬Optimal Path¡x×HAHA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x×HAHAÙ59c6be96e-38f7-11f0-2d30-a71f02755abc/f97aed3be1675ad6„¦layoutŠ¨autosizeÂ§paddingÊ¥xaxis‹¨showlineÃ©gridcolor¥black¨tickvalsÇ€?@@@¥range×€?€@¨ticktextÇ(€?€?€?@@€?©linecolor¥black¨showgridÃ¨gridwithÊ?€¨zerolineÃ¥title«Wind Values¦mirrorÃpaper_bgcolor°rgba(0, 0, 0, 0)¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦heightÊCH¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥yaxis‰¨showgridÃ¨showlineÃ©gridcolor¥black©gridwidthÊ?€¨tickvalsÇ€?@@@¥range×€?€@¦mirrorÃ¨ticktext“   ©linecolor¥black¥titleƒ¤font¤sizeÊA`¤textÙ Optimal policy
path example¡xÊ?¥widthÊCH¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤data–‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖÀ?¤type§scatter¤text‘¡S¡xÖÀ?‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖ`@¤type§scatter¤text‘¡G¡xÖ`@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À? @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @`@¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x× @`@Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/6eecf72f2f10b69c„¦layout†¦xaxis1…¨tickvals–§leaving©reach_car¬exit_highway¦snd_rd§home_st¦arrive¥title¤text¥State¦domain×ffæ>¨ticktext–®leaving office©reach car¯exiting highway«2ndary road«home street«arrive home¦anchor¢y1¦yaxis1ƒ¥title¤textÙ Predicted total
travel time¦domain×€?¦anchor¢x1¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦xaxis2…¨tickvals–§leaving©reach_car¬exit_highway¦snd_rd§home_st¦arrive¥title¤text¥State¦domain×ÍÌ?€?¨ticktext–®leaving office©reach car¯exiting highway«2ndary road«home street«arrive home¦anchor¢y2¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¦yaxis2‚¦domain×€?¦anchor¢x2¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤dataÜ‡¤line¥color¥black¥xaxis¢x1¡yÇðA BB B,B,B¤type§scatter¤name®actual outcome¥yaxis¢y1¡x–§leaving©reach_car¬exit_highway¦snd_rd§home_st¦arriveˆ¤mode¥lines¤line‚¥color¥black¤dash¤dash¥xaxis¢x1¡yÇ,B,B,B,B,B,B¤type§scatter¤name¶Monte Carlo Prediction¥yaxis¢y1¡x–§leaving©reach_car¬exit_highway¦snd_rd§home_st¦arrive‰ªshowlegendÂ¡x’§leaving§leaving¤line¥color£red¥xaxis¢x1¡y×ðA,B¤type§scatter¤name°Mone Carlo Error¥yaxis¢y1¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’©reach_car©reach_car¤line¥color£red¥xaxis¢x1¡y× B,B¤type§scatter¤name°Mone Carlo Error¥yaxis¢y1¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’¬exit_highway¬exit_highway¤line¥color£red¥xaxis¢x1¡y×B,B¤type§scatter¤name°Mone Carlo Error¥yaxis¢y1¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’¦snd_rd¦snd_rd¤line¥color£red¥xaxis¢x1¡y× B,B¤type§scatter¤name°Mone Carlo Error¥yaxis¢y1¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’§home_st§home_st¤line¥color£red¥xaxis¢x1¡y×,B,B¤type§scatter¤name°Mone Carlo Error¥yaxis¢y1¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’¦arrive¦arrive¤line¥color£red¥xaxis¢x1¡y×,B,B¤type§scatter¤name°Mone Carlo Error¥yaxis¢y1¦marker‚¦symbol¬arrow-bar-up¨angleref¨previousˆªshowlegendÂ¤line¥color¥black¥xaxis¢x2¡yÇðA BB B,B,B¤type§scatter¤name®actual outcome¥yaxis¢y2¡x–§leaving©reach_car¬exit_highway¦snd_rd§home_st¦arriveˆ¤mode¥lines¤lineƒ¥color¥black¤dash¤dash¥shape¢hv¥xaxis¢x2¡yÇ BB B,B,B,B¤type§scatter¤name°TD(0) Prediction¥yaxis¢y2¡x–§leaving©reach_car¬exit_highway¦snd_rd§home_st¦arrive‰ªshowlegendÂ¡x’§leaving§leaving¤line¥color£red¥xaxis¢x2¡y×ðA B¤type§scatter¤name«TD(0) Error¥yaxis¢y2¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’©reach_car©reach_car¤line¥color£red¥xaxis¢x2¡y× BB¤type§scatter¤name«TD(0) Error¥yaxis¢y2¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’¬exit_highway¬exit_highway¤line¥color£red¥xaxis¢x2¡y×B B¤type§scatter¤name«TD(0) Error¥yaxis¢y2¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’¦snd_rd¦snd_rd¤line¥color£red¥xaxis¢x2¡y× B,B¤type§scatter¤name«TD(0) Error¥yaxis¢y2¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’§home_st§home_st¤line¥color£red¥xaxis¢x2¡y×,B,B¤type§scatter¤name«TD(0) Error¥yaxis¢y2¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’¦arrive¦arrive¤line¥color£red¥xaxis¢x2¡y×,B,B¤type§scatter¤name«TD(0) Error¥yaxis¢y2¦marker‚¦symbol¬arrow-bar-up¨angleref¨previousÙ59c6be96e-38f7-11f0-2d30-a71f02755abc/a1553d03eb644044„¦layout†¦xaxis1…¨tickvals–§leaving©reach_car¬exit_highway¦snd_rd§home_st¦arrive¥title¤text¥State¦domain×ffæ>¨ticktext–®leaving office©reach car¯exiting highway«2ndary road«home street«arrive home¦anchor¢y1¦yaxis1ƒ¥title¤textÙ Predicted total
travel time¦domain×€?¦anchor¢x1¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦xaxis2…¨tickvals–§leaving©reach_car¬exit_highway¦snd_rd§home_st¦arrive¥title¤text¥State¦domain×ÍÌ?€?¨ticktext–®leaving office©reach car¯exiting highway«2ndary road«home street«arrive home¦anchor¢y2¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¦yaxis2‚¦domain×€?¦anchor¢x2¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤dataÜ‡¤line¥color¥black¥xaxis¢x1¡yÇðABBBBB¤type§scatter¤name®actual outcome¥yaxis¢y1¡x–§leaving©reach_car¬exit_highway¦snd_rd§home_st¦arriveˆ¤mode¥lines¤line‚¥color¥black¤dash¤dash¥xaxis¢x1¡yÇBBBBBB¤type§scatter¤name¶Monte Carlo Prediction¥yaxis¢y1¡x–§leaving©reach_car¬exit_highway¦snd_rd§home_st¦arrive‰ªshowlegendÂ¡x’§leaving§leaving¤line¥color£red¥xaxis¢x1¡y×ðAB¤type§scatter¤name°Mone Carlo Error¥yaxis¢y1¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’©reach_car©reach_car¤line¥color£red¥xaxis¢x1¡y×BB¤type§scatter¤name°Mone Carlo Error¥yaxis¢y1¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’¬exit_highway¬exit_highway¤line¥color£red¥xaxis¢x1¡y×BB¤type§scatter¤name°Mone Carlo Error¥yaxis¢y1¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’¦snd_rd¦snd_rd¤line¥color£red¥xaxis¢x1¡y×BB¤type§scatter¤name°Mone Carlo Error¥yaxis¢y1¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’§home_st§home_st¤line¥color£red¥xaxis¢x1¡y×BB¤type§scatter¤name°Mone Carlo Error¥yaxis¢y1¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’¦arrive¦arrive¤line¥color£red¥xaxis¢x1¡y×BB¤type§scatter¤name°Mone Carlo Error¥yaxis¢y1¦marker‚¦symbol¬arrow-bar-up¨angleref¨previousˆªshowlegendÂ¤line¥color¥black¥xaxis¢x2¡yÇðABBBBB¤type§scatter¤name®actual outcome¥yaxis¢y2¡x–§leaving©reach_car¬exit_highway¦snd_rd§home_st¦arriveˆ¤mode¥lines¤lineƒ¥color¥black¤dash¤dash¥shape¢hv¥xaxis¢x2¡yÇBBBBBB¤type§scatter¤name°TD(0) Prediction¥yaxis¢y2¡x–§leaving©reach_car¬exit_highway¦snd_rd§home_st¦arrive‰ªshowlegendÂ¡x’§leaving§leaving¤line¥color£red¥xaxis¢x2¡y×ðAB¤type§scatter¤name«TD(0) Error¥yaxis¢y2¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’©reach_car©reach_car¤line¥color£red¥xaxis¢x2¡y×BB¤type§scatter¤name«TD(0) Error¥yaxis¢y2¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’¬exit_highway¬exit_highway¤line¥color£red¥xaxis¢x2¡y×BB¤type§scatter¤name«TD(0) Error¥yaxis¢y2¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’¦snd_rd¦snd_rd¤line¥color£red¥xaxis¢x2¡y×BB¤type§scatter¤name«TD(0) Error¥yaxis¢y2¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’§home_st§home_st¤line¥color£red¥xaxis¢x2¡y×BB¤type§scatter¤name«TD(0) Error¥yaxis¢y2¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’¦arrive¦arrive¤line¥color£red¥xaxis¢x2¡y×BB¤type§scatter¤name«TD(0) Error¥yaxis¢y2¦marker‚¦symbol¬arrow-bar-up¨angleref¨previousÙ59c6be96e-38f7-11f0-2d30-a71f02755abc/c69864c8f78f9c34„¦layout‡¦xaxis1ƒ¥title¤text¥State¦domain×ffæ>¦anchor¢y1¦yaxis1‚¦domain×€?¦anchor¢x1¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦xaxis2ƒ¥title¤text°Walks / Episodes¦domain×ÍÌ?€?¦anchor¢y2¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¦yaxis2‚¦domain×€?¦anchor¢x2«annotations’‰§yanchor¦bottom§xanchor¦center¡yÊ?€¤font¤sizeÊA€©showarrowÂ¤yref¥paper¤textÙ-Estimated Value with TD(0)
with Î± = 0.2¤xref¥paper¡xÊ>fff‰§yanchor¦bottom§xanchor¦center¡yÊ?€¤font¤sizeÊA€©showarrowÂ¤yref¥paper¤textÙ)Empirical RMS error, averaged over states¤xref¥paper¡xÊ?Fff¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤data—‡¤line¥color¥black¥xaxis¢x1¡yÇ«ª*>«ªª>?«ª*?UUU?¤type§scatter¤name«True values¥yaxis¢y1¡x•¡A¡B¡C¡D¡E†¥xaxis¢x1¡yÇ¤type§scatter¤nameª0 episodes¥yaxis¢y1¡x•¡A¡B¡C¡D¡E†¥xaxis¢x1¡yÇ¤type§scatter¤nameª1 episodes¥yaxis¢y1¡x•¡A¡B¡C¡D¡E†¥xaxis¢x1¡yÇÅ§:Ûì¥=Äq>*,?¤type§scatter¤nameª7 episodes¥yaxis¢y1¡x•¡A¡B¡C¡D¡E†¥xaxis¢x1¡yÇÝ¡nó¥>w ?c#3?¤type§scatter¤name«15 episodes¥yaxis¢y1¡x•¡A¡B¡C¡D¡E†¥xaxis¢x1¡yÇ£žU=J`6>j6Ü> ß6?kUW?¤type§scatter¤name«99 episodes¥yaxis¢y1¡x•¡A¡B¡C¡D¡E‡ªshowlegendÂ¥xaxis¢x2¡yÈ”c‚ ?£ì?†žý>cÞñ>„å>°Ù>»0Î>5=Ä>ï·>.¯>½Š¥>ço>%ê•>1¡Ž>?ž†>¯}>ÿšr>h>ÏØ^>9îX>kµN>n E>ãŽ;>ð»3>Tž,>Í'> ,!>:1>»ë>²ú>>Ô>ñÜý=fô=4í=Ëæ=åè=MŒÜ=þéÙ=³ŽÏ=øžÎ=vÔÈ=¸È=¡Â=ApÀ=îÞ½= I»=Fð¾= r¾=$o½=ù*»=o ¾=¯e¶=fº¹=åú·=±/¹=<–µ=1¼=t¶=Å{¶=£™²=]p®=‘ï±=P²=úF²=Ht°=} ®=Ž²«=;q¦=Ú¶¤=¿\Ÿ=/ì=ÂÙ=2Ÿ=¼´£=øœ¥=˜”¤=ë¤=Œ.¨=f§=c§¤=v¢=÷»¥=9¦="±¢=yx©=´=Æy«=˜R§=§9¨=–×¥=µ>¢=l]¦=¦N§=šÁ§=]ù£=±Ãª=¤n©=ýó¢=¥=g%Ÿ=¤type§scatter¤name©RMS error¥yaxis¢y2¡xÈ€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A¨A°A¸AÀAÈAÐAØAàAèAðAøABBBBBBBB B$B(B,B0B4B8BcŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦heightÊCH¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥yaxis‰¨showgridÃ¨showlineÃ©gridcolor¥black©gridwidthÊ?€¨tickvalsÇ€?@@@¥range×€?€@¦mirrorÃ¨ticktext“   ©linecolor¥black¥titleƒ¤font¤sizeÊA`¤textÙ Optimal policy
path example¡xÊ?¥widthÊCH¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤dataÜf‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖÀ?¤type§scatter¤text‘¡S¡xÖÀ?‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖ`@¤type§scatter¤text‘¡G¡xÖ`@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/bc25cbf31a6c6942„¦layout„¥xaxis¥title¤text¨Episodes¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥yaxis‚¥title½Sum of rewards during episode¥range×ÈÂpÁ¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤data’„¡yÈÐáÒðÄ…+ÂÃ —4ÃìÑÃìÑÃffÃ¸^Ã×#ØÂ ×ËÂfæ¨ÂÂ¯Âš™¬Âff¹Â\¡ÂHá¢Âáz¤Â›Â¸ž‡Â‡ÂHá„ÂÍÌsÂ×£mÂR8…Âš™hÂ33~Â®rÂÃõrÂ{,Âö(ZÂ ×CÂ ×KÂsÂq= Â…ë(Â33_Â®G Â{&Â33*Â= LÂ338ÂÂ-Â…Â= Âö(ìÁš™Âö(&ÂR¸ÂÂÂ… Âff Âö(Â×£èÁHá&ÂR¸ÂÂÂ¸ýÁš™#Â¸Â¤pÂ{Â33 Â…6Â…ëÂff&Â{4Â= õÁ×£ÂázÂ¤páÁ{Â{Â\îÁÍÌÂ= +Â ×ÓÁÂ ×óÁ¸ ÂffôÁÂìQ"Âq=Â33ÂÂÂ…ÑÁö(ÊÁ33ÕÁ{àÁázìÁ{þÁ¸ÝÁ¤pùÁ= ßÁ®çÁ®ãÁ®íÁö(èÁázêÁ®ÓÁ×£ÒÁÃõÖÁš™ÑÁ¸Â ×ÂHáèÁ{ Âff Â…ë%Â)\÷Á)\ÂR¸$Â{ÂÂÂ®GýÁ¤pÂš™ÂÃõêÁ®GÂÍÌÚÁ\ÐÁÂq=êÁÍÌòÁ…óÁázØÁR¸öÁázÂR¸ Â{ÜÁffÂìQèÁ…ëóÁázØÁffÒÁ®áÁ= íÁÂ…ëÏÁÂÓÁ33áÁìQÐÁázÎÁ{ÀÁö(ìÁ{ìÁÍÌÂ…ÂîÁö(äÁ®GÿÁÄÁ= "Â…ÂR¸Â®ÏÁ¸÷ÁR¸ôÁfføÁ®GÂ)\ãÁR¸ôÁö(Âš™ÑÁ ×ËÁ®ïÁö(ìÁÃõÊÁš™áÁÂÍÌÒÁq=úÁ\Â…ëçÁÂÂ)\ÛÁ…ÓÁš™¯Á¸ÂffÂ×£ÊÁ¤pÓÁ…éÁ\¸ÁHáäÁR¸ÊÁ{ÖÁffÚÁ33ÛÁHáæÁ×£ Âáz Â®G Â®ÝÁff¶Á\þÁR¸ÈÁffÔÁHáâÁR¸ôÁ¤pÑÁ= ÏÁ\ÐÁìQÌÁÃõÐÁHáÐÁR¸´ÁHáÌÁ{äÁÃõèÁ…ë¿Áff²ÁHáæÁìQÊÁš™Áq=ÌÁÒÁ\ÎÁ®GÁÁR¸ÜÁš™éÁHáÞÁ¤pÂÂ»Á ×áÁÃõòÁ33ÅÁHáÞÁìQæÁ¸õÁ\¼Á®ÇÁ×£êÁ{ÚÁÂãÁ33ÏÁ ×ÏÁR¸GÂ\þÁÂãÁÃõÄÁ)\×Áq=Âff¸Á…ëÅÁ¸ËÁ33ËÁìQÐÁÃõäÁffÂÁ= ÇÁÍÌÊÁÂáÁ…ÑÁq= Â\ÀÁ33ûÁ…ëÁÁš™×ÁÃõ¸ÁÃõÚÁö(ÂÁ¤pÅÁ®GËÁÂÝÁ×£ÆÁ¤pãÁ{¼Á= »ÁázÊÁR¸àÁ\²Áq=¸ÁázäÁ33·Á¤pÃÁ33ÝÁ…ë½Á×£ÂÁÍÌêÁ¸ãÁázÐÁö(ÂìQÖÁHáÌÁ33ÕÁHáÔÁ®ãÁìQÂÍÌÚÁffôÁ¸ËÁR¸ÚÁ\ÎÁ ×Â¸©Á)\¹Á\êÁÂÑÁ…ëÅÁ…ë·Á{ÌÁázêÁázÂÃõÂš™ÉÁ33ÕÁ33ÛÁö(ÚÁ®ÇÁ ×ÏÁ…õÁ×£ØÁ¤pÓÁÃõÔÁ×£ÐÁ= ¿ÁázØÁ×£ÎÁ…ëéÁÃõÌÁ…ëÝÁ¤p½Á®GÛÁ\²Áö(ÜÁ®GÇÁázÂÁÂ¹Á\ÌÁÂËÁR¸¸ÁìQªÁÖÁ®GÑÁìQìÁq=ÎÁÃõÚÁ)\áÁ\ÐÁ)\ÝÁffÂš™³Á®GÃÁ= ÝÁ®GÂ¸ýÁš™×Á ×ÝÁ®GÙÁ¤pÓÁ…ëÕÁ…ËÁÂ¹ÁîÁÍÌÚÁÂÏÁìQÎÁö(¾Á×£¸Á®GÁÁ= ÕÁázØÁ¶Á®GÉÁ…ëÓÁš™ÓÁ{¶Á)\ýÁÂéÁìQêÁ®G×ÁÂµÁR¸ÈÁ×£ÖÁR¸°Á¸Âq=¸Áš™ÅÁ®GÑÁ ×ßÁ\ÂÁö(ÌÁq=æÁHáÌÁÃõÒÁö(ÜÁ…ßÁš™ÓÁÂßÁ\àÁ…ëÃÁÂÁÁ×£ÐÁffÂR¸ÄÁìQÌÁHáÂ…éÁ…çÁìQÊÁázÖÁ×£¸Áö(¶Á= §ÁÍÌÎÁ ×ËÁÀÁ)\ËÁš™ïÁ= ÍÁÍÌÈÁÍÌÂ×£ÌÁHáäÁ…ÉÁ…ë·Á…ëóÁ¸Â…ÏÁR¸Â®ÙÁ)\¿ÁÍÌÐÁÂÉÁ®ÛÁ…ÓÁ{ÈÁìQÆÁ®GÇÁ ×ïÁÂÅÁ)\ÁÁ\ÂÁÃõÞÁš™Â= ÓÁ…áÁÂ½Á33ÁÁ ×ÉÁ ×ÁÁÃõ®Á\üÁÜÁR¸ÌÁ)\ïÁ®åÁq=ÔÁ= óÁ ×ïÁ= ýÁ33³ÁÍÌÜÁffÌÁ{ÚÁq=ÜÁÂ»Á…ëßÁÌÁÂÁHáÂÁ)\ûÁÂÙÁìQÚÁ{ÎÁffÈÁ)\ÕÁR¸ÊÁÃõîÁ¤p¹Áö(ÖÁÃõÄÁÃõÀÁÃõÀÁ®µÁö(Â)\ÛÁÍÌØÁÍÌ Â)\ñÁ ×ãÁ= ÏÁR¸ÆÁ33ÍÁ{ÚÁ= ÇÁ®ÁÁìQÊÁ¤pÉÁ)\ÁÁázÔÁ®ÉÁ ×ÓÁ33Â¤páÁÃõâÁ¤pñÁ{ÂÁ×£²ÁìQÄÁ¤type§scatter¤name¥Sarsa¡xÈÐ€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A¨A°A¸AÀAÈAÐAØAàAèAðAøABBBBBBBB B$B(B,B0B4B8BC?C@CACBCCCDCECFCGCHCICJCKCLCMCNCOCPCQCRCSCTCUCVCWCXCYCZC[C\C]C^C_C`CaCbCcCdCeCfCgChCiCjCkClCmCnCoCpCqCrCsCtCuCvCwCxCyCzC{C|C}C~CC€C€€CC€C‚C€‚CƒC€ƒC„C€„C…C€…C†C€†C‡C€‡CˆC€ˆC‰C€‰CŠC€ŠC‹C€‹CŒC€ŒCC€CŽC€ŽCC€CC€C‘C€‘C’C€’C“C€“C”C€”C•C€•C–C€–C—C€—C˜C€˜C™C€™CšC€šC›C€›CœC€œCC€CžC€žCŸC€ŸC C€ C¡C€¡C¢C€¢C£C€£C¤C€¤C¥C€¥C¦C€¦C§C€§C¨C€¨C©C€©CªC€ªC«C€«C¬C€¬CC€C®C€®C¯C€¯C°C€°C±C€±C²C€²C³C€³C´C€´CµC€µC¶C€¶C·C€·C¸C€¸C¹C€¹CºC€ºC»C€»C¼C€¼C½C€½C¾C€¾C¿C€¿CÀC€ÀCÁC€ÁCÂC€ÂCÃC€ÃCÄC€ÄCÅC€ÅCÆC€ÆCÇC€ÇCÈC€ÈCÉC€ÉCÊC€ÊCËC€ËCÌC€ÌCÍC€ÍCÎC€ÎCÏC€ÏCÐC€ÐCÑC€ÑCÒC€ÒCÓC€ÓCÔC€ÔCÕC€ÕCÖC€ÖC×C€×CØC€ØCÙC€ÙCÚC€ÚCÛC€ÛCÜC€ÜCÝC€ÝCÞC€ÞCßC€ßCàC€àCáC€áCâC€âCãC€ãCäC€äCåC€åCæC€æCçC€çCèC€èCéC€éCêC€êCëC€ëCìC€ìCíC€íCîC€îCïC€ïCðC€ðCñC€ñCòC€òCóC€óCôC€ôCõC€õCöC€öC÷C€÷CøC€øCùC€ùCúC„¡yÈÐ¤îÄB´ÃìÑ@Ã)ÜÃ®÷ÂúÂ)ÜÃÍL½ÂHañÂ¸žÄÂfæÃÂ¤p¦Â¸¹Â®ÇœÂ×£ÂR8˜Â€ŒÂ…kƒÂffuÂìQzÂ)Ü™Â\Â®Ç€ÂÃõrÂ¸€ÂÍÌ}Â= ‡ÂÍÌ`ÂÍLÂffEÂ33fÂq=XÂHáFÂš™fÂ ×KÂ¸JÂ…€Â\3Â®|Â¤p)Âq=@Â33WÂš™:Â{OÂq=QÂ ×IÂ×£4ÂìQNÂ×£@Â… ÂÍÌOÂ®YÂ×£RÂ\:Â…ëfÂö(fÂö(HÂš™NÂ®VÂö(<Â ×>ÂÂUÂR¸YÂ®+Â{+ÂáziÂÂCÂ33IÂ®G?ÂÃõÂq=<Â¸6Âö(;ÂÂÂ…LÂš™8Â…)Â…ë@Â®\ÂR¸@Â…ëAÂ¤pYÂÂ~Â€ŠÂš™GÂ®G^Âff/Âáz\Â…ëxÂHájÂ…ë.Â×£WÂ)\Â ×1Â…:Â¸WÂ.ÂÂ(Â…ëuÂÍÌ#Â ×SÂö(*Â= `Âq=\Âš™!ÂffyÂìQ2Âfæ†ÂìQ8Â¤pIÂ= 2Â ×KÂ33-ÂHá3Â¤p6Â{DÂ®G9ÂaÂ{nÂ= IÂö(JÂš™ÂfÂ…ë(Â¸EÂö(EÂVÂ\>Â= >Â…ë{Â¸mÂ…PÂ\YÂö(FÂ¸DÂffmÂš™9ÂR¸Â¤puÂ<Â®G1Â×£rÂ{"ÂR¸#Â= Âš™qÂ¸BÂ)\=Âq=ÂÃõhÂìQ[ÂázNÂ®IÂÃõVÂR¸5ÂWÂáz/Â¸Â ×Â¸-Â33GÂHáRÂÂdÂÂLÂö(:ÂHáMÂÂFÂ\CÂ®:Â®GkÂq=?Â{?Âáz\Â)\BÂ ×6Â…ë9Â×£LÂ®XÂ)\NÂ®GIÂÂff5Âáú’Â ×<Â33.ÂÂRÂ®XÂÃõGÂ\1Â\XÂázÂö(1Â= RÂ{mÂ®Âq=fÂ®;Â…ÂÍÌSÂö(mÂ®GZÂš™AÂš™$Â×£CÂ…hÂffrÂ)\ëÁÍÌÂÂ®GHÂ ×JÂHáÂ{„Â)\%Â)\-Â®vÂ= iÂ×£?Â×£WÂÍÌ0Â…ë$ÂÃõNÂ33zÂ×£CÂázÂ ×aÂázhÂ®AÂ\(Â\VÂ®BÂö(Â)\CÂff)Â)\XÂÃõGÂ)\IÂR¸&ÂHá7Â2Â¤p*Â33JÂff5ÂffDÂff?Â)\gÂq=jÂö(Â×£pÂ®GNÂ ×9Âff$Â¤p`Â®Â ×vÂö("Â)\2Âö(XÂ ×tÂffúÁ…MÂÃõiÂ¸JÂR¸mÂ\KÂ)\2Â¤p/ÂìQeÂ336Âq=ZÂR¸PÂš™KÂ33IÂìQbÂÍÌnÂ…LÂq=GÂ)\NÂ= HÂ33VÂö($Â…ëSÂ®GXÂHáPÂRÂvÂÍÌ ÂÍÌ~Âö(WÂ;Â®JÂ= PÂffÂ)\7Â\8Â\-ÂìQ#Â¤pqÂ…4Â{+Â×£AÂ¤pMÂ…UÂR¸cÂìQmÂÃõaÂTÂffHÂHáGÂ×£UÂ×£_ÂÂÂáz5Â…]ÂR¸1Âq=>Â ×CÂ{ Âö(6Â)\BÂ×£|Âš™]Â¸JÂÃõrÂìQGÂÃõ0ÂìQ;ÂìQfÂ{YÂ¤pKÂ= RÂ ×lÂrÂÂ?Âš™]Â>Â…6Âš™dÂìQNÂ= 6ÂÃõ€ÂÂ%ÂR¸>ÂÂGÂHá†Â…fÂ= =ÂÃõ5Âö(VÂ)\pÂR¸YÂìQHÂìQ'Âq=xÂR¸ ÂffKÂq=Â®SÂ{*Â¤p6Âš™1Â¸ƒÂ…XÂHádÂš™+Â= 9Âff9Â= ;Â\IÂ…ë5ÂázSÂ…NÂÂ8ÂÂ=ÂHázÂ®OÂ{nÂö(!Âö(gÂš™Â…ëBÂ= Â= VÂ ×hÂq=>Âö(WÂq=WÂ33EÂ®bÂázSÂ W…Â…Â{JÂìQSÂ\5Â…ë$ÂÍÌMÂÍÌPÂÂ Â\aÂÍÌ]Âq=AÂázÂázpÂ{rÂ)\8Â¸[Â…BÂö(*Â= SÂ®GdÂffOÂ®G‰Â®Â{9ÂHá Â®AÂÍÌaÂ33eÂ¤p)Â…qÂÍÌDÂ…RÂffIÂ= ?Â×£-Âq=ZÂázÂš™.Âš™JÂJÂAÂ¸AÂö(BÂázcÂÂÂÃõJÂÃõ4ÂHá9Â¸Âö(,Â…ëtÂš™UÂ…ëÂqÂìQ=Âö(5ÂÍÌÂ®G:Â\‡Âö(fÂHá6Â{=Âáz+ÂYÂ= DÂ…ë7Â= ,ÂázGÂHádÂq=€Âš™Âš™@Âq=PÂö(\Âš™DÂ¤p[Â\GÂö(8Â¸eÂÂ6Â…FÂ)\cÂázJÂq=:Â ×RÂ¤pÂÂÂ{Âq=LÂ\5ÂHá8Â\€Â¸)Â×£fÂ×£dÂÃõ\ÂìQ0Â…k€ÂffMÂ~ÂHáQÂ W€Â\0ÂìQ)Â)\4Â{ZÂáz\Â ×ÂÂ¤type§scatter¤nameªQ-learning¡xÈÐ€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A¨A°A¸AÀAÈAÐAØAàAèAðAøABBBBBBBB B$B(B,B0B4B8BC?C@CACBCCCDCECFCGCHCICJCKCLCMCNCOCPCQCRCSCTCUCVCWCXCYCZC[C\C]C^C_C`CaCbCcCdCeCfCgChCiCjCkClCmCnCoCpCqCrCsCtCuCvCwCxCyCzC{C|C}C~CC€C€€CC€C‚C€‚CƒC€ƒC„C€„C…C€…C†C€†C‡C€‡CˆC€ˆC‰C€‰CŠC€ŠC‹C€‹CŒC€ŒCC€CŽC€ŽCC€CC€C‘C€‘C’C€’C“C€“C”C€”C•C€•C–C€–C—C€—C˜C€˜C™C€™CšC€šC›C€›CœC€œCC€CžC€žCŸC€ŸC C€ C¡C€¡C¢C€¢C£C€£C¤C€¤C¥C€¥C¦C€¦C§C€§C¨C€¨C©C€©CªC€ªC«C€«C¬C€¬CC€C®C€®C¯C€¯C°C€°C±C€±C²C€²C³C€³C´C€´CµC€µC¶C€¶C·C€·C¸C€¸C¹C€¹CºC€ºC»C€»C¼C€¼C½C€½C¾C€¾C¿C€¿CÀC€ÀCÁC€ÁCÂC€ÂCÃC€ÃCÄC€ÄCÅC€ÅCÆC€ÆCÇC€ÇCÈC€ÈCÉC€ÉCÊC€ÊCËC€ËCÌC€ÌCÍC€ÍCÎC€ÎCÏC€ÏCÐC€ÐCÑC€ÑCÒC€ÒCÓC€ÓCÔC€ÔCÕC€ÕCÖC€ÖC×C€×CØC€ØCÙC€ÙCÚC€ÚCÛC€ÛCÜC€ÜCÝC€ÝCÞC€ÞCßC€ßCàC€àCáC€áCâC€âCãC€ãCäC€äCåC€åCæC€æCçC€çCèC€èCéC€éCêC€êCëC€ëCìC€ìCíC€íCîC€îCïC€ïCðC€ðCñC€ñCòC€òCóC€óCôC€ôCõC€õCöC€öC÷C€÷CøC€øCùC€ùCúCÙ59c6be96e-38f7-11f0-2d30-a71f02755abc/b3ded7d596cbc23f„¦layout„¥xaxis¥title¤text¨Episodes¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥yaxis‚¥titleÙ.Average steps per episode
during training¥range×ÈB¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤data“„¡yÈÐáºDq}DC)\ÐB¤pÑB®Ç•B3³²BB¸ž’BÃõ•Bff|BìQ„BÂ€BìQaBff_B®SB33NB¤pPB{RB)\?BÃõ-BÂ9B®+B…-BÃõB®G Bq=7B®GB×£BHáBq=Bö(B6B{Bq=þAš™B¤p(BázòAffþAö(B{BHáîAö(B= óA¸÷A ×ÝAÃõêAìQàA)\ãA¸ãAš™çA¸õA®GéAHáÚA= ÛA…ëÓAìQÚAR¸ÚA…ÝA)\ÕA)\ÕAffÒA¸¿AázÐAR¸ÈAázÊAö(ÊAHáÄAHáÔAffÀA…ëÇAÃõ¾A®GçAš™ÃAff¾A®½AHá¾AÍÌÂAÌA…ÅA{ÂAÃõ¼AHáÂAš™»Aff¼A\ºAff¸A×£ÌA®õAq=ÂA×£¸A ×ÝA{ÈA®GÅAÂ»AÍÌ¸Aq=ÀAq=ºAš™ÁAq=ªAáz¾A®G±A)\±A®ÍA= ¹A{ÚA®µAºAáz¬Aq=¶A ×½Aš™»AR¸´Aš™A\¸A33¹AffÈAìQÆA33µAÍÌ¼Aáz¸A{´AR¸¶AìQªA®G¯A{ÐAff²A ×·A33¥AÍÌ¸A= AÂ·AR¸ÈA= ½AffÖAff®AìQ¸A¤pA…µA= «AR¸´A¸¯AìQ°AR¸²A¸¿AìQÂAff²AÃõ²AÂ¡A…µAázÀAš™A®G©AÂµA…ë±Aq=ºA¤p·Aáz´Aö(´A®±Aö(¸A33µAHá¤AªA®A ×A\ªAÂ©A\´A…ÕA\´A×£¾AÂµA\¨Aff¼Aff®A= ÃAÂ¯A®G¯A= A×£°AR¸¨A)\ÏAìQ°Aš™§A®G¥Aff¨AÂAš™ÃA¸ÑA®¯Aff¦A…§AR¸ªAÂ§AìQ¶A®A×£¶A…©A= £A¸§A¤p¡A…ëŸAš™£Aff¦Aq=®A×£¨A{ A{žAR¸¨A…ë©Aq=®AÂ«A…¯A33¯Aö(¦A…ë£A{¨Aö(ªAq=¨A ×©AìQ¢A¤p³A¤p³A)\§A ×§A®§Aö(°AÍÌ¨AázªA¸§AÂ§A¸«AÂ§A¸§AÃõ¸A®§AìQªAÃõ¬Aö(¢A×£¼AR¸¶A…ë½A®G·AìQ¬Aff¦AÍÌºAÂ±AÃõ¦AÍÌ¦Aö(¸AìQ¤A\®AžAq=¦A33«A…ë©A¤p§A33¡Aq=¨A…ë¥A= ¥A¨AÂA= AìQÀA®§A…¥A)\³A®G¥A\®A¸ÍA…ÅAìQ´A)\§AÍÌ¬AÂ¥A¤p³Aáz¨AR¸®A®©Aáz²A×£ªA ×©A¸¥AÃõÀA ×«Aš™ŸA¸±A{¬A®·AìQºAÂÓA¤pA33¯Aff°A= AÍÌªA\¼AHá¬A= §A= «AìQ¬A×£¤A¸¥AHá¢AìQºA×£¾AffÆA®»AÍÌ®AHá°AìQ¦A\ªA…¿A®GÍA= «A…«A ×§A®Aff¦A®¥A®G¯A ×ÉA×£´A…«AR¸¶A®»AR¸ÆA®GÁAHáÖAHáÊA ×·Aö(¬A®AÂ³AìQ¦Aö(¼A ×¹AÂ±AÍÌªA33µA®«Aš™§A¸ÃA{ØAffÎA®G³A33·Aš™ÁAÂ·Aq=´A®G³AÂ©Aš™±A®GAq=®A®¥AR¸¤AÍÌ¤A®µAìQ¨Aq=¦A{¦AìQ¦AR¸¨AÂ§A¸AÃõžA= §A33ÏA33ÁA)\¿AÍÌ¨Aq=àA= óAR¸þAHáÊA…ßA ×¯A…×A\´A×£¬AÀA{ÒAffØA¤pÇAÃõºAR¸¼Aáz¶A\ºAÂ«Aáz¶A)\·AÍÌ¬A{¸A\¼A®¹Aö(ÈAÔAö(°AHá´A…ë³Aáz¼Aš™¯Aš™±Aö(´Aö(ÀA…ë©Aq=°AÍÌ¬Aö(¶A)\±AÂ¹A®¡Aáz¨A ×§A…ÏAìQªAáz´AìQºAš™µA…ÅA33¯A¤p¯AÂA¤p«A ×«AffºAÍÌªA¸±A¸«Aff°A¤p¯AHá°A…ë§A®¥A¤p¯A ×±AÂŸA)\µA33§AHáøAR¸¤AÂ¥A¸åA…ëA¸±A= ¥A…¿Aö(®AÍÌ¤A®G³A= ·A¸±A¸¯A33©AÂ±A)\£A…©A\¦Aš™AHá¬A®«AHá¦A)\ÅA®G±A¸ÇA= ·A…ÅAš™±A®«Aö(®A\ØAÂ¯A¸µAÍÌ®A33¹Aáz´AÍÌ¨Aff°AÃõªAÂ«A{´AÂµA¸½A¤p§A{¬A…ëAžAö(ªA¤p£A\¤A…¯A¸¯AÍÌ¤A…ë¥A33©A)\¥A®GA ×«A¤type§scatter¤name¥Sarsa¡xÈÐ€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A¨A°A¸AÀAÈAÐAØAàAèAðAøABBBBBBBB B$B(B,B0B4B8BC?C@CACBCCCDCECFCGCHCICJCKCLCMCNCOCPCQCRCSCTCUCVCWCXCYCZC[C\C]C^C_C`CaCbCcCdCeCfCgChCiCjCkClCmCnCoCpCqCrCsCtCuCvCwCxCyCzC{C|C}C~CC€C€€CC€C‚C€‚CƒC€ƒC„C€„C…C€…C†C€†C‡C€‡CˆC€ˆC‰C€‰CŠC€ŠC‹C€‹CŒC€ŒCC€CŽC€ŽCC€CC€C‘C€‘C’C€’C“C€“C”C€”C•C€•C–C€–C—C€—C˜C€˜C™C€™CšC€šC›C€›CœC€œCC€CžC€žCŸC€ŸC C€ C¡C€¡C¢C€¢C£C€£C¤C€¤C¥C€¥C¦C€¦C§C€§C¨C€¨C©C€©CªC€ªC«C€«C¬C€¬CC€C®C€®C¯C€¯C°C€°C±C€±C²C€²C³C€³C´C€´CµC€µC¶C€¶C·C€·C¸C€¸C¹C€¹CºC€ºC»C€»C¼C€¼C½C€½C¾C€¾C¿C€¿CÀC€ÀCÁC€ÁCÂC€ÂCÃC€ÃCÄC€ÄCÅC€ÅCÆC€ÆCÇC€ÇCÈC€ÈCÉC€ÉCÊC€ÊCËC€ËCÌC€ÌCÍC€ÍCÎC€ÎCÏC€ÏCÐC€ÐCÑC€ÑCÒC€ÒCÓC€ÓCÔC€ÔCÕC€ÕCÖC€ÖC×C€×CØC€ØCÙC€ÙCÚC€ÚCÛC€ÛCÜC€ÜCÝC€ÝCÞC€ÞCßC€ßCàC€àCáC€áCâC€âCãC€ãCäC€äCåC€åCæC€æCçC€çCèC€èCéC€éCêC€êCëC€ëCìC€ìCíC€íCîC€îCïC€ïCðC€ðCñC€ñCòC€òCóC€óCôC€ôCõC€õCöC€öC÷C€÷CøC€øCùC€ùCúC„¡yÈÐÍlîC;C®ÇÎB×#°B…£BìÑ”B{–BÃu„BR¸‚B{aBÂpBÃõpB33dBìQdB ×UBö(C?C@CACBCCCDCECFCGCHCICJCKCLCMCNCOCPCQCRCSCTCUCVCWCXCYCZC[C\C]C^C_C`CaCbCcCdCeCfCgChCiCjCkClCmCnCoCpCqCrCsCtCuCvCwCxCyCzC{C|C}C~CC€C€€CC€C‚C€‚CƒC€ƒC„C€„C…C€…C†C€†C‡C€‡CˆC€ˆC‰C€‰CŠC€ŠC‹C€‹CŒC€ŒCC€CŽC€ŽCC€CC€C‘C€‘C’C€’C“C€“C”C€”C•C€•C–C€–C—C€—C˜C€˜C™C€™CšC€šC›C€›CœC€œCC€CžC€žCŸC€ŸC C€ C¡C€¡C¢C€¢C£C€£C¤C€¤C¥C€¥C¦C€¦C§C€§C¨C€¨C©C€©CªC€ªC«C€«C¬C€¬CC€C®C€®C¯C€¯C°C€°C±C€±C²C€²C³C€³C´C€´CµC€µC¶C€¶C·C€·C¸C€¸C¹C€¹CºC€ºC»C€»C¼C€¼C½C€½C¾C€¾C¿C€¿CÀC€ÀCÁC€ÁCÂC€ÂCÃC€ÃCÄC€ÄCÅC€ÅCÆC€ÆCÇC€ÇCÈC€ÈCÉC€ÉCÊC€ÊCËC€ËCÌC€ÌCÍC€ÍCÎC€ÎCÏC€ÏCÐC€ÐCÑC€ÑCÒC€ÒCÓC€ÓCÔC€ÔCÕC€ÕCÖC€ÖC×C€×CØC€ØCÙC€ÙCÚC€ÚCÛC€ÛCÜC€ÜCÝC€ÝCÞC€ÞCßC€ßCàC€àCáC€áCâC€âCãC€ãCäC€äCåC€åCæC€æCçC€çCèC€èCéC€éCêC€êCëC€ëCìC€ìCíC€íCîC€îCïC€ïCðC€ðCñC€ñCòC€òCóC€óCôC€ôCõC€õCöC€öC÷C€÷CøC€øCùC€ùCúC„¡yÈÐf†ÚCáú C¤pÉB.±BffšBáú›Bš™‘BHa‹Bq=yB33B)\B{MBÂXB\VB@BR¸8Bff3BÃõ-Bq="B®GB\"B)\B¸BR¸BB®GBìQBR¸B®çA®GéAö(äAÞA×£ÚA×£âA…ëÍAÍÌÐA= ÍAÃõÄA{ÊAáz¾Aff¼A®¿A{´AR¸°AìQ®Aáz¬AÃõ¦AªAáz¤AìQ¨A®£A= ¥A®AffžA¸›AR¸”A= ›AÍÌšAÍÌ˜Aš™•A¤p“Aö(’Aq=’AìQAš™AÍÌ”Aö(”Aš™‘AŒAìQAš™‘Aff–A ×‘Aq=A®“AázAÃõŒAázŽA…‘A= AázŒA×£’A= Aš™‹A ×A®A®GA…A…ë‹Aš™‘A= A= A®AÂAq=ŽA= ‘A)\A…ëA®GAR¸ŽAÃõˆA= ‘A{ŽA×£ŒAš™‹A\ŽA®G‹A…‰AffŒAö(ŠA\ŒA®GA…‹AìQˆA\ŒAHáŒA®GA®G‘A\ŠAA¤p‹AìQŒA…‡Aö(ŒA33‹A¸A ×‰AázˆAHáŒA¤pAÂA{ŒA= ‹AìQŠAR¸ˆA{ŒA\AÂAHáAÃõŠA×£ŽA…‰A…‰A®G‹A{ŠA)\‹AázŒAázŠA33‹AÃõŒAR¸†A®‹A…ë‰A®A¸‹AázŒAq=ŠAffˆAq=ŠA®‹A…‰A…ë‹AffŒAÂ‰A33‰A{ŽA×£ŠA ×‹A¤p‹A)\‰A\ŒAÍÌŠAÍÌˆAö(ŠA\ˆAR¸ŒAffˆAö(ŒA¸‹AázŒA×£ŠA…‹AffŠA= ‹A{ŠA{ŒAˆAìQŒA= ‰AázŠA×£ŠA×£ŒAq=ŒAÂA)\AffŒAö(ˆA)\A…‰A…ë‹A{ˆA×£ŠA{ŽA…‹A ×‰A®‹Aö(ŒAq=ˆAÍÌŒA®G‰A¸‡AR¸†Aq=ŒA{†A×£ˆAÂ‹AÂ‹A{ŠA= ‹A¸‰AìQŽAö(ŽAq=ŠA{ŒA)\‰A{ˆA{ŒA)\A\ŠAR¸ŠAÂ‡A\ŒA®‰A¤p‹A…ë‰AHáˆAR¸ˆA)\‰AR¸ˆA…‰A33‹AÂ‰A®G‹A…‡A\ŽAÍÌˆAÍÌŠA)\A…‹AÃõˆA®‰A×£†A33‹A)\‹A×£ŠAìQŠAR¸ŠA®‰A{A®G‰Aš™‰AÍÌˆA®G‰A)\‡Aö(ŠA)\‹A…ë…Aö(ŽA ×‰AŠAìQŒAö(†Aö(ŠAÂ‰AffŠAÂ‡A)\‰A…ë‡AìQˆA®‹A¸‡A33‰AÍÌˆAR¸ŠAÂ‹AázˆAHáŠAázŒAq=ŠA…ë‰AR¸ŠA®G‰A¸‰AázŠA= ‹AázˆA= ‡A¸‡A®G‹A33‹AázŠAHáˆA\ŠA{ˆAq=ˆA…ë‰A×£ŒA¸‹Aö(ŠAŠA¸‰AR¸ŠA…‹A= AHá†A{ˆAffŠA)\‰Aq=ŠAìQŽAÍÌˆAÃõˆAÂ‰A®G‰A{ŠA{ˆAÂ‰A…‡Aq=†AázˆAR¸†A×£†Aáz†A ×‹A= ‹A ×‡A)\‰AHáˆA¤p‹A×£ˆAázˆA®G‡A®G‹A= ‰A…ë‰A\ŠA®‡A{ŠAÂ‰A)\‰AŠAìQˆA{ŒA33Aq=ˆA…ë‹A ×‡Aö(ˆA{†AÃõŒA…‹A®‰A…AÍÌ†AHáˆA¸‡A®G‹A33‰A33‡AHáˆAHáˆA¤p‰AázŒAÍÌŒA……Aš™‰AázŒAHáŠA¤p‡A= A\ŠAffŒA33‰A= ‰AìQŠA¤p‰AHáŠA…‹A33‰A®G‰A\ˆAš™‹A\ŽAš™‹AR¸ŠAHáŠA¸‰AŒAÃõˆA…ë‰A®‰A{†A{ˆAÃõŠAš™‰AìQŒA®G‰A®‰A33‡A×£ŠAìQˆA= ‰A¤p…Aq=ˆA33‡A…‹AHáŒAázˆAR¸ŽAázŠA)\‰AìQ†A¸A…ë‹AHáˆA×£ŠAŒA{ŠAq=ˆAŠAffŠAÂ‡A= Aq=ˆAš™‰AffŠAq=ˆAÍÌˆA…‰Aff†A= ‹AÂ…A®‰AÍÌˆA®G‰AŒAázŠA…ë‰A¸‰A33‹Aö(ŠAìQŽA)\‹A ×‰AÍÌˆAö(ŠAö(ŠA= ‡A…ë‡AÃõŒA×£ˆA…ë‡A…‡A®GA…ƒA®G‰AÍÌŠAŠA)\‹AR¸ˆA ×‰A33‰Aö(ˆAš™‹A¸Aq=†A= ‡AÍÌˆAÃõŠA×£ˆA…ë‡A×£ŠA¤p‰A33‰A)\…A)\‡AŠA{ŠA×£ŠA¤p‡AÃõŠA= ‹A…ë‹AR¸†Aq=ŒA)\‹A…‰A×£ˆAÂ‡AR¸†A¤type§scatter¤name®Expected Sarsa¡xÈÐ€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A¨A°A¸AÀAÈAÐAØAàAèAðAøABBBBBBBB B$B(B,B0B4B8BC?C@CACBCCCDCECFCGCHCICJCKCLCMCNCOCPCQCRCSCTCUCVCWCXCYCZC[C\C]C^C_C`CaCbCcCdCeCfCgChCiCjCkClCmCnCoCpCqCrCsCtCuCvCwCxCyCzC{C|C}C~CC€C€€CC€C‚C€‚CƒC€ƒC„C€„C…C€…C†C€†C‡C€‡CˆC€ˆC‰C€‰CŠC€ŠC‹C€‹CŒC€ŒCC€CŽC€ŽCC€CC€C‘C€‘C’C€’C“C€“C”C€”C•C€•C–C€–C—C€—C˜C€˜C™C€™CšC€šC›C€›CœC€œCC€CžC€žCŸC€ŸC C€ C¡C€¡C¢C€¢C£C€£C¤C€¤C¥C€¥C¦C€¦C§C€§C¨C€¨C©C€©CªC€ªC«C€«C¬C€¬CC€C®C€®C¯C€¯C°C€°C±C€±C²C€²C³C€³C´C€´CµC€µC¶C€¶C·C€·C¸C€¸C¹C€¹CºC€ºC»C€»C¼C€¼C½C€½C¾C€¾C¿C€¿CÀC€ÀCÁC€ÁCÂC€ÂCÃC€ÃCÄC€ÄCÅC€ÅCÆC€ÆCÇC€ÇCÈC€ÈCÉC€ÉCÊC€ÊCËC€ËCÌC€ÌCÍC€ÍCÎC€ÎCÏC€ÏCÐC€ÐCÑC€ÑCÒC€ÒCÓC€ÓCÔC€ÔCÕC€ÕCÖC€ÖC×C€×CØC€ØCÙC€ÙCÚC€ÚCÛC€ÛCÜC€ÜCÝC€ÝCÞC€ÞCßC€ßCàC€àCáC€áCâC€âCãC€ãCäC€äCåC€åCæC€æCçC€çCèC€èCéC€éCêC€êCëC€ëCìC€ìCíC€íCîC€îCïC€ïCðC€ðCñC€ñCòC€òCóC€óCôC€ôCõC€õCöC€öC÷C€÷CøC€øCùC€ùCúCÙ49c6be96e-38f7-11f0-2d30-a71f02755abc/13d8f542ac69f87„¦layout„¥xaxis¥title¤textªTime steps¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥yaxis¥title¤text¨Episodes¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤data‘„¤line¥color£red¡yÈ¨€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A¨A°A¸AÀAÈAÐAØAàAèAðAøABBBBBBBB B$B(B,B0B4B8BcŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦heightÊCR¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥yaxis‰¨showgridÃ¨showlineÃ©gridcolor¥black©gridwidthÊ?€¨tickvalsÇ€?@@@€@ @À@à@¥range×€?A¦mirrorÃ¨ticktext—       ©linecolor¥black¥titleƒ¤font¤sizeÊA`¤textÙ Optimal policy
path example¡xÊ?¥widthÊC–¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤dataœ‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖ@¤type§scatter¤text‘¡S¡xÖÀ?‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖ@¤type§scatter¤text‘¡G¡xÖA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@`@¤type§scatter¤name¬Optimal Path¡x×À? @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@ @¤type§scatter¤name¬Optimal Path¡x× @`@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x×`@@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×@°@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x×°@Ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x×Ð@ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x×ð@A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x×AA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @`@¤type§scatter¤name¬Optimal Path¡x×AA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@@¤type§scatter¤name¬Optimal Path¡x×AAÙ59c6be96e-38f7-11f0-2d30-a71f02755abc/5f08b9d1ec5530fd„¦layout„¥xaxis¥title¤text¨Episodes¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥yaxis‚¥title½Sum of rewards during episode¥range×ÈÂpÁ¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤data“„¡yÈÐ)ìçÄ¤pãÃš™!Ãš&Ã33ÏÂ)Ü ÃìQÕÂ= ÌÂõÂ×£µÂ…ÓÂ.ºÂ. Â®G™Âff•Â{”šÂ3³›Â)\¬Âö(“Â®GqÂHaŽÂÃõrÂR¸|ÂázÂÂSÂšÂÂQÂ^ÂìQIÂ×£RÂ®7Â€ŒÂ ×Â×£.Â= ?Â®sÂHáÂÂ*Âš™8ÂázPÂ…ë*Â×£7ÂHá8Â{+Â×£ Â{Â ×Â¸)Âö(ÂffÂ{*ÂìQÂ ×(Â!Â¤pÂ×£ Â¸Â)\Â¸"Â= *ÂúÁ¸ÂHáÂ= Âö(êÁÍÌÂÃõÂ…ëÂ…ëïÁ×£ÂìQþÁ= /Â= ûÁö(ÖÁ…ÍÁÂq=úÁ®ÂffÂ®éÁffôÁázêÁ= óÁÂ Â{êÁ…ëçÁq=ôÁ)\*ÂÂÂìQØÁ\Âš™Â®GÂ®GëÁffàÁ ×çÁÂéÁ¸ñÁÂÁÍÌ Â¸ÁÁ33Â®GõÁ×£àÁÍÌÂ¤pÍÁ…éÁ{ÔÁ{ÆÁ¤påÁìQ ÂffÔÁ®GÍÁ ×ÂÃõÐÁ ×ÿÁæÁR¸äÁ= Â…ëïÁÂÓÁ{öÁ{ÂÁ¸¿Áš™Â ×éÁ¤pßÁÃõ¼ÁìQèÁÍÌÄÁ…ÏÁq=øÁö(Âö(îÁq=¾Á ×çÁ¸ÍÁ= åÁq=úÁHá ÂÍÌÎÁ ×ßÁq=âÁö(Â ×Âq=ÂÁffêÁ…¹Á¸ÝÁq=ØÁ®GÍÁÍÌØÁ¤pÕÁ)\éÁÒÁìQÂ ×óÁ®ãÁ33áÁÂßÁ¸ ÂázÌÁ)\éÁ®GõÁ…ÍÁö(ÒÁ¤pÉÁ ×ûÁ¤p ÂìQÌÁÃõÂ)\ÝÁff¸ÁäÁö(ÆÁázúÁ®GßÁázþÁffìÁq=ØÁ\¸ÁHáþÁ ×ßÁ¸×ÁHáÌÁ®ïÁ®GÝÁìQÂìQÂ)\ÏÁÎÁ¸ÏÁö(âÁ¸çÁÖÁ®ÁázÆÁ33ÉÁÍÌºÁÍÌÆÁ)\©Á ×§Á®GÃÁ…ëÕÁ…ëÍÁq=ÐÁÂ¿ÁÂ½Á{èÁ¸ùÁ…ëÍÁ= óÁ33ÏÁR¸ÞÁ…ë½Á…ËÁ ×¿Á ×ÉÁ×£Â…ÉÁÂÁ¸ÓÁÍÌòÁHáÖÁ)\×Á®GÏÁÂ×Áq=àÁ ×éÁÍÌÆÁš™·ÁÃõºÁš™·Á×£ÖÁìQøÁ¤p¿ÁÂáÁÍÌ¼Á ×ÁÁázÌÁìQÞÁÂÍÁÃõÖÁÂãÁq=¶ÁÂ®GáÁázÖÁázÆÁ…ëÏÁö(´Á…ëíÁ ×Á¾Á= »Á33ñÁHáÞÁÃõ¸Á{¸ÁÂµÁ\ÔÁ¸ÿÁš™½ÁázäÁ ×ïÁ¸ßÁ¸ÍÁ= ÓÁHáÌÁö(ÖÁÍÌìÁ¤pÂ®óÁ¸¿Á\ÄÁ…½ÁHáêÁö(ÈÁázÆÁ®GÑÁ{ÚÁq=ÒÁ®¹ÁázäÁÂ®»Á)\·Á×£àÁš™ÛÁ= ÷ÁÚÁffÂ33ÅÁ= ¿Áq=ÀÁffìÁö(êÁ…ëûÁ\ÌÁÍÌ¾ÁR¸ÊÁ ×ÛÁìQÄÁR¸ÌÁÍÌªÁÚÁö(îÁ ×Â= ûÁázÎÁázØÁ ×ÕÁâÁ= ïÁ{Â×£ÒÁ¸ÓÁ…ÇÁ…ëµÁq=¶Á®GÍÁÍÌÞÁ¤pñÁffÌÁ¤p³Áö(îÁ¤pÂHáÂR¸øÁázþÁÃõÂ)\çÁ¼Á ×½Á33ëÁ®åÁ®ëÁ33ùÁ¸ñÁffÒÁÍÌÜÁ33ÛÁ= ßÁR¸êÁÂÂ…ëýÁÃõÒÁ= ÇÁ¤pÂ33ïÁ®ëÁHáÚÁ)\ÑÁ)\ÉÁÍÌÜÁÂÝÁ33ÕÁö(ÜÁR¸¬Á®GÝÁ®çÁ¾Á33ýÁ…ëÍÁ{èÁ33ßÁÃõ¼ÁHá¦ÁÃõ®ÁR¸þÁÃõØÁR¸þÁffÐÁR¸Â= )Â®GÂ×£âÁ33ÿÁ…ÏÁ¸ÿÁq=ÔÁ\´Á¤p÷ÁÂÂÂHáþÁ×£ÚÁÂö(ÖÁö(âÁ33ãÁff¾ÁHáæÁq=äÁ×£Âö(äÁ¸ñÁR¸Âš™ÂÂ×Á\ÔÁš™ÓÁö(ÜÁ= çÁ)\Â®ãÁ ×ßÁ¤pÙÁ ××ÁffÔÁ…ëÍÁ33ÁÁ¤pÙÁ…±Áq=ÀÁÂ¯Á33ïÁÊÁ{ÜÁÍÌÂÃõôÁÃõüÁÃõÆÁ= ×Á®µÁÃõÚÁ¤pÓÁ…ëéÁffÒÁ\èÁffòÁ…ëßÁ¸ÏÁR¸ÀÁ…ÏÁ®GÍÁq=Â33ñÁ¤p¿Á33ÅÁR¸ÖÁ¸ÂffÄÁ¤pÅÁö(ÂÂ½ÁHáÈÁHá´Á)\ÏÁ{¶Á×£´Á¸ÃÁHáÆÁ×£àÁ\æÁÃõÀÁ®GáÁHáÒÁ¸ÑÁ{ÖÁ= åÁ\ÌÁ33ÛÁázÎÁ¸ÝÁ= ÉÁHáÞÁR¸ÖÁffÂÃõðÁ)\ËÁ®ÝÁ{Â= ÷ÁÍÌÔÁ\ÆÁ¸ÁÁq=ÌÁ\ÀÁš™ÿÁR¸ÂÁ¸ëÁ×£Â…ÍÁ×£ìÁ¸ÇÁ ×ÃÁ…ÕÁ®½Á…éÁ¸ÃÁff´Á)\¿ÁfföÁázÄÁ¤pÕÁ\èÁÃõÌÁ\ôÁ¤pÓÁ¤type§scatter¤name¥Sarsa¡xÈÐ€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A¨A°A¸AÀAÈAÐAØAàAèAðAøABBBBBBBB B$B(B,B0B4B8BC?C@CACBCCCDCECFCGCHCICJCKCLCMCNCOCPCQCRCSCTCUCVCWCXCYCZC[C\C]C^C_C`CaCbCcCdCeCfCgChCiCjCkClCmCnCoCpCqCrCsCtCuCvCwCxCyCzC{C|C}C~CC€C€€CC€C‚C€‚CƒC€ƒC„C€„C…C€…C†C€†C‡C€‡CˆC€ˆC‰C€‰CŠC€ŠC‹C€‹CŒC€ŒCC€CŽC€ŽCC€CC€C‘C€‘C’C€’C“C€“C”C€”C•C€•C–C€–C—C€—C˜C€˜C™C€™CšC€šC›C€›CœC€œCC€CžC€žCŸC€ŸC C€ C¡C€¡C¢C€¢C£C€£C¤C€¤C¥C€¥C¦C€¦C§C€§C¨C€¨C©C€©CªC€ªC«C€«C¬C€¬CC€C®C€®C¯C€¯C°C€°C±C€±C²C€²C³C€³C´C€´CµC€µC¶C€¶C·C€·C¸C€¸C¹C€¹CºC€ºC»C€»C¼C€¼C½C€½C¾C€¾C¿C€¿CÀC€ÀCÁC€ÁCÂC€ÂCÃC€ÃCÄC€ÄCÅC€ÅCÆC€ÆCÇC€ÇCÈC€ÈCÉC€ÉCÊC€ÊCËC€ËCÌC€ÌCÍC€ÍCÎC€ÎCÏC€ÏCÐC€ÐCÑC€ÑCÒC€ÒCÓC€ÓCÔC€ÔCÕC€ÕCÖC€ÖC×C€×CØC€ØCÙC€ÙCÚC€ÚCÛC€ÛCÜC€ÜCÝC€ÝCÞC€ÞCßC€ßCàC€àCáC€áCâC€âCãC€ãCäC€äCåC€åCæC€æCçC€çCèC€èCéC€éCêC€êCëC€ëCìC€ìCíC€íCîC€îCïC€ïCðC€ðCñC€ñCòC€òCóC€óCôC€ôCõC€õCöC€öC÷C€÷CøC€øCùC€ùCúC„¡yÈÐÍâÄ®G×ÃÃuCÃ3sÃf&ÃÃõéÂ.ïÂ=ŠßÂÃõÍÂ= ¢ÂÃuÓÂ¤pßÂ W½ÂR8ÏÂ= ÂÂ33µÂÂµÂÍÌšÂ¤p{Â¸†ÂHá€Â®]ÂR¸yÂ3³Â®GzÂÃõrÂ= gÂìÑ”Â)\iÂš™fÂ®ZÂš™fÂ\eÂ33\Â…ë[ÂffLÂázdÂ33BÂ\9Â¸+ÂÂEÂ…1Â)Âö(ÂÂ;ÂÍÌ4Â®5Âö(Â= PÂ¤p<Â¤p?Âq=;Â®G]ÂR¸KÂ)\NÂ= oÂ ×TÂR¸uÂ®GtÂ®jÂ= (Âáz3Â= Â)\óÁ{cÂ¸PÂáz Âff6ÂR¸WÂö(YÂ¸IÂÂLÂ¸&ÂìQ3Â®GDÂázBÂ)\?Âš™:Â®GJÂ\DÂ…2Â×£$Â®wÂB„Â®GÂö(_ÂÂ=ÂÃõJÂ×£Â×£VÂìQKÂffOÂ W‘Âq=?ÂázTÂ¸NÂ\Â…1Âff]Â33LÂ ×UÂ\oÂB‚Â¸?ÂR¸øÁ×£Â®G!ÂffcÂq=^ÂÂ'Â33TÂ®GÂ…8Â®GLÂìQ0Âáz@Â ×5ÂÍÌSÂ…]ÂìÑ€Â33Â…4Â ×2Â¤pÂR¸WÂÂCÂö(‹ÂÃõ4Âš™`Â\^ÂR¸-Â= pÂ ×9Â¸ÂìQrÂÃõiÂ¸Â¸hÂ)\CÂ¤pÂR¸=Âáz?ÂázFÂ\'Â\:Â\WÂ…ëYÂš™TÂq=BÂ®G Â\TÂ ×AÂš™*Â×£#Â¸Â= -Â®G8Â33?Â= DÂ¤p#ÂaÂR¸YÂ¤pÂ{$Â)\JÂÂPÂ)\9Â®G7Â¤p&ÂÂAÂìQKÂ×£RÂHá‚Âö(NÂffPÂ\dÂ¸\ÂázXÂ33?Âq=YÂR¸1Â×£5ÂáznÂÂfÂ)\Âö(PÂÍÌFÂR¸=Â33=ÂR¸8ÂìQ1Âö(=Âq=VÂ×£NÂ¸XÂ{)Â= ÂìQRÂÍÌÂHátÂÂSÂ×£\Â33.ÂffBÂ= XÂ…ë`Â33[Â¸CÂÂAÂlÂ¤pKÂÃõDÂ¸bÂ{hÂÃõ\Â…ë>Â33$Â33^Âš™%Âáz]Â ×Â…mÂ…ë!Â¤p)Âš™%Â= PÂR¸{Âff9Âš™;ÂR¸bÂ33^Â®G1Â®G-ÂR¸4Â)\=ÂÃõmÂö(ZÂš™AÂ¤paÂÍÌOÂ\GÂ{:Â33ÂR¸JÂffJÂÂ:ÂázSÂ×£Â×£WÂ{IÂ¤p,ÂFÂ¤pjÂR¸gÂ ×.Â×£NÂ= ÂìQdÂ)\wÂ\9Â×£HÂffÂ{CÂÃõ~Âö(ZÂš™AÂš™EÂ33;Â…4Â ×\ÂffBÂ>ÂÃõEÂ×£ Â…ërÂ\yÂ×£-Â×£&ÂìQ(ÂÍÌWÂ®GbÂáz7Â33iÂ…ë5Â…:Âáz Â{bÂÂ@ÂÍÌ{Âáz6Â\$Â= *Âš™FÂ33dÂ>Â)\XÂ®lÂB–Âq=MÂ{WÂ¤p0Â®-ÂÃõhÂ{mÂázOÂÃõ?Â…ë+ÂÂCÂ¤pdÂ…(ÂÍÌJÂÃõAÂ®G9Â= !Â ×iÂB„ÂffTÂ…0ÂR¸$Âq=JÂ)\ƒÂÍÌRÂ¸+ÂšŒÂ)\?ÂR¸-ÂR¸$Â…NÂ…ë9Â®GÂÃõ)ÂR¸jÂff%ÂÃõCÂ\4ÂÂ.Â{KÂHá3ÂHáVÂ{6Â®G4ÂHáDÂ¸$Â ×bÂ®G?ÂÂWÂ ×/ÂÂuÂ¤pFÂ…sÂ{ÂR¸0Âš™TÂ®GhÂìQAÂ\BÂÂ:ÂìQ%ÂR¸OÂ{€Â ×JÂ…TÂÃõ:Â®#Â¸:Â¸mÂ{XÂÃõGÂ\uÂq=7ÂÂMÂ33OÂ ×eÂffaÂq==ÂHá>ÂxÂq=EÂ®FÂáziÂ…6ÂÃõIÂÃõTÂ\GÂHá@Â\QÂázPÂ® Â®:Âq=4ÂÃõmÂÃõFÂHá>ÂÂ>Â\uÂ{MÂHáZÂLÂáz,ÂÍÌRÂ33JÂ33[Â®?Â®\Â…ë5Â×£AÂ= BÂÃõUÂ¤pSÂffVÂ ×MÂ{%Âš™WÂ33JÂ ×yÂ®G<Â®G@Â ×Â¤pfÂ\ÂìQNÂ ×SÂ)\9Â33<ÂÃõPÂTÂö(0Â\^ÂHá>Âš™9Â3Â®GIÂHá6Â\PÂÃõÂ{;Â\FÂìQ>Â33*Â®1Âff*ÂffIÂ…]Â…Â\^Â…ë%Â33HÂHá9Â…ë4Â{jÂ×£&ÂHáEÂ{~Â33GÂ…SÂÍÌ8ÂìQLÂ= ?ÂHá0ÂR¸?Â ×jÂ®7Â¤p:Â33PÂš™zÂ\Â{Â8Âq=HÂ\xÂš™Â¤p‹Â ×GÂ ×9Â…ÂÂ3Â®G<Â\5ÂHaˆÂ†Â)\*Â{PÂ®G#ÂR¸PÂ{IÂHáXÂÂ7Â ×IÂ¸-ÂR¸BÂ×£MÂ¸9Â)\cÂ\hÂ{wÂö(LÂ)\bÂázMÂPÂÂÂ¤type§scatter¤nameªQ-learning¡xÈÐ€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A¨A°A¸AÀAÈAÐAØAàAèAðAøABBBBBBBB B$B(B,B0B4B8BC?C@CACBCCCDCECFCGCHCICJCKCLCMCNCOCPCQCRCSCTCUCVCWCXCYCZC[C\C]C^C_C`CaCbCcCdCeCfCgChCiCjCkClCmCnCoCpCqCrCsCtCuCvCwCxCyCzC{C|C}C~CC€C€€CC€C‚C€‚CƒC€ƒC„C€„C…C€…C†C€†C‡C€‡CˆC€ˆC‰C€‰CŠC€ŠC‹C€‹CŒC€ŒCC€CŽC€ŽCC€CC€C‘C€‘C’C€’C“C€“C”C€”C•C€•C–C€–C—C€—C˜C€˜C™C€™CšC€šC›C€›CœC€œCC€CžC€žCŸC€ŸC C€ C¡C€¡C¢C€¢C£C€£C¤C€¤C¥C€¥C¦C€¦C§C€§C¨C€¨C©C€©CªC€ªC«C€«C¬C€¬CC€C®C€®C¯C€¯C°C€°C±C€±C²C€²C³C€³C´C€´CµC€µC¶C€¶C·C€·C¸C€¸C¹C€¹CºC€ºC»C€»C¼C€¼C½C€½C¾C€¾C¿C€¿CÀC€ÀCÁC€ÁCÂC€ÂCÃC€ÃCÄC€ÄCÅC€ÅCÆC€ÆCÇC€ÇCÈC€ÈCÉC€ÉCÊC€ÊCËC€ËCÌC€ÌCÍC€ÍCÎC€ÎCÏC€ÏCÐC€ÐCÑC€ÑCÒC€ÒCÓC€ÓCÔC€ÔCÕC€ÕCÖC€ÖC×C€×CØC€ØCÙC€ÙCÚC€ÚCÛC€ÛCÜC€ÜCÝC€ÝCÞC€ÞCßC€ßCàC€àCáC€áCâC€âCãC€ãCäC€äCåC€åCæC€æCçC€çCèC€èCéC€éCêC€êCëC€ëCìC€ìCíC€íCîC€îCïC€ïCðC€ðCñC€ñCòC€òCóC€óCôC€ôCõC€õCöC€öC÷C€÷CøC€øCùC€ùCúC„¡yÈÐ¤8ÂÄ×£¤Ã33Ã…òÂHáÍÂázÍÂáúÎÂ)Ü¾ÂÍLœÂ×#Â×£ÈÂB‚Â’Â)Ü”Â)\Â€ÂÍLÂáz]Â×£]ÂÂPÂ= VÂR¸WÂ×£@ÂìQ/ÂffJÂÍÌ5ÂÂ=Â¸<ÂázÂ)\ÂÂÂ×£Â ×Â= Â…õÁ ×Â33ÂffüÁ®Â{æÁö(ÔÁ)\Â…ëÁö(èÁö(¾Áq=ÄÁR¸¾Á¤páÁ{ÌÁ…ëÏÁ= ãÁHá´Á¸ÕÁ ×ÕÁ= £Áö(ÌÁ×£ÂÁ×£ªÁ{àÁ= ÍÁÃõÂÁš™ÉÁ…ë±Á…ë·Á¤pÁff¼Á…ë«Á33¹ÁÂ£Áq=˜Á¸ÁÁ ×ÍÁ¤p¹Á…ë¯ÁHáâÁ…ëÇÁ\´Á…ëÅÁ¸¹Á\¾Áff”ÁìQ²ÁÃõ”Á®G«Á)\½Á¸ÇÁ= ¥Á33¯Á ×“ÁR¸èÁ= ÁázÆÁ¤p§Á…¥Á ×µÁ\ÀÁHá¼Á…·ÁHá¶Áff®Á\°ÁR¸°Á ×¥ÁázœÁ33³Áq=®ÁÃõªÁ¤p‘Á…ë»ÁšÁffœÁázÞÁ¸³ÁìQˆÁ{¼ÁR¸œÁ¸ÁffèÁìQ¢ÁÂ§Á®G›Á…ë³Á33§Á…ë£Á×£ÂÁÍÌ¬Á…©ÁffÁáz´ÁÃõ¼Á)\µÁ ×£ÁÃõ’Á ×¹Ááz Áš™»ÁÈÁ)\µÁáz¸ÁR¸¢Áff¦ÁÃõÀÁ= ¹ÁHá²Á ×¡Á33›Á{´Áq=¢Á= ›ÁR¸¤ÁázžÁš™“Á¤p¹Á¸ÅÁHá¢Áq=¤Á{šÁìQÁ ×±Á…›Á¤p‘Á…³Á{¬Á…¡ÁHá¨Á…ÅÁ\’Áš™£Á33£ÁHá¸Áö(´Á×£šÁáz¨Á…ÉÁázÁö(ÄÁìQÁœÁázÊÁö(¬Áq=²Á®G£Á²ÁHášÁ¤pÉÁÂ«Á…ëÁ{¤ÁR¸¨Á…ëÁÁìQªÁff¤Á¤Áš™ÁÃõ¶Á{¬Á…ëŸÁÃõ´Á¤p‘Á…³ÁÂ§Áq=²Áš™½Á¤p“Áš™¡Á= ËÁö(ŒÁ{˜Ááz¬Á×£ÈÁÃõ–Á\–Á¤ÁŽÁ×£ˆÁš™›Áš™›Áš™¹ÁHášÁÍÌ¨Á{¦ÁÂµÁ…ë©Á…ë›Á= ©ÁÁÂ«ÁHá¾Áq=ªÁáz¢Áš™—ÁffœÁ)\©Á= ³Á)\ÁÁÍÌÁq=¸Á33™ÁìQ°Á33©ÁR¸ºÁ)\±Á¸›Á¤pÁìQ¦Á×£˜ÁR¸’Á\ÜÁHáÊÁ\°Á®G±ÁffžÁ= ›Á®G“ÁázšÁq=’Ááz¢Á…™Á…ÇÁ= ¡Á…‘ÁR¸Á¸™Á= §Á…ë¡ÁÃõ²Á…Á ×ÁÂ‘Á)\ÉÁ{¤Á ×¥Á ×©Á…¡Á{ªÁÂ‡ÁÍÌÀÁš™§Áö(˜Á®G³ÁHážÁHá¨Áö(ÈÁáz¢Á33ÃÁö(¨Á×£¢Á¼Á¢ÁÂ™Áö(ÂÁ= ¡ÁHá Á ×ÉÁR¸ªÁìQ˜Á×£®ÁÃõ–ÁÍÌºÁ= ›Áff’ÁÍÌÁq=ªÁ{ˆÁ{˜ÁÂ™ÁìQ¬ÁR¸²ÁšÁ ×™Á= ‘ÁffªÁ= »ÁR¸¬ÁR¸–Á{ˆÁq=šÁÃõ°Á ×±Á®Á×£˜ÁHáÁ…¡Á¸™Á…ÁÁÂ§Á¤p©Á33§ÁžÁázˆÁ\–ÁffžÁö(¦Á®›ÁR¸ªÁÂÁ¸¡Ááz°Á¸«Ááz˜Áq= Á\ÎÁ\ÒÁ\¸ÁÂ™ÁÂÁ…—Á ×¡Á¤p©ÁHá¸ÁÂ¡Áö(˜Á®³ÁÃõ¤Á ×¯Á…³Á¤p¯Á˜Á ×Á\´Á)\›Á¤p¡ÁÃõÄÁff®Á\¨ÁÃõ–ÁÃõªÁ\ÈÁÃõžÁáz°ÁR¸˜Á33¡Á{´Ááz¬Á¸Á…‘Á…ëÃÁ×£¢Á¸§Á\¼Ááz’Á{¬ÁHá¨ÁR¸¨Áö(šÁ¸©Áq=ÊÁ¸³ÁÃõ ÁÃõ¨ÁìQ Á¤p›ÁÆÁ¤p›Ááz¢Á\ªÁ×£¸Á…»Ááz¸Áš™©Á¸ÁÁŽÁÁ\²Á)\¡Á ×»Á= ¡Á¸ÁÁÍÌ®Áff¢Áq=Á= ‰Á¸¥Á{˜Á= —ÁÃõÂÁ\¬Á{°Á{ÎÁºÁ33™Áö(–Á×£¾Áš™«Á×£ Áff¢Áš™³ÁÂ©Áö(ÁÂ¡Áö(¢Á)\¯ÁHáœÁ Á®G©Áö(¢Á…ë§Ááz¨Á®G¡Á{¦Á×£²Á®Á¤p¡Áff°ÁÍÌ¸Áš™³Áq=¢Á…±ÁÍÌ¨Á\ÊÁš™ÁÁ®Á= «Á®™Á×£˜ÁÂ±Á{’ÁR¸¦Áš™§Ááz¼Áq=°ÁÂ—ÁÃõ¾Á= ¥Á33£ÁÃõ¨ÁìQºÁ®©Á×£ÒÁff¨Áš™¡ÁÍÌ°Á…ëŸÁ®G«ÁÃõœÁÂµÁHá–Á\ ÁR¸¢ÁìQ¨Á®ŸÁ\’Á®G™Á= ™ÁÃõ¬Á®GÁ®©Á ×¡Áff¢Á= ¯ÁR¸¢ÁHášÁ¤p»Á\–Á{œÁÃõ²Á¤p‘Á{ÀÁ…ŸÁö(¾Á¤type§scatter¤name®Expected Sarsa¡xÈÐ€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A¨A°A¸AÀAÈAÐAØAàAèAðAøABBBBBBBB B$B(B,B0B4B8BC?C@CACBCCCDCECFCGCHCICJCKCLCMCNCOCPCQCRCSCTCUCVCWCXCYCZC[C\C]C^C_C`CaCbCcCdCeCfCgChCiCjCkClCmCnCoCpCqCrCsCtCuCvCwCxCyCzC{C|C}C~CC€C€€CC€C‚C€‚CƒC€ƒC„C€„C…C€…C†C€†C‡C€‡CˆC€ˆC‰C€‰CŠC€ŠC‹C€‹CŒC€ŒCC€CŽC€ŽCC€CC€C‘C€‘C’C€’C“C€“C”C€”C•C€•C–C€–C—C€—C˜C€˜C™C€™CšC€šC›C€›CœC€œCC€CžC€žCŸC€ŸC C€ C¡C€¡C¢C€¢C£C€£C¤C€¤C¥C€¥C¦C€¦C§C€§C¨C€¨C©C€©CªC€ªC«C€«C¬C€¬CC€C®C€®C¯C€¯C°C€°C±C€±C²C€²C³C€³C´C€´CµC€µC¶C€¶C·C€·C¸C€¸C¹C€¹CºC€ºC»C€»C¼C€¼C½C€½C¾C€¾C¿C€¿CÀC€ÀCÁC€ÁCÂC€ÂCÃC€ÃCÄC€ÄCÅC€ÅCÆC€ÆCÇC€ÇCÈC€ÈCÉC€ÉCÊC€ÊCËC€ËCÌC€ÌCÍC€ÍCÎC€ÎCÏC€ÏCÐC€ÐCÑC€ÑCÒC€ÒCÓC€ÓCÔC€ÔCÕC€ÕCÖC€ÖC×C€×CØC€ØCÙC€ÙCÚC€ÚCÛC€ÛCÜC€ÜCÝC€ÝCÞC€ÞCßC€ßCàC€àCáC€áCâC€âCãC€ãCäC€äCåC€åCæC€æCçC€çCèC€èCéC€éCêC€êCëC€ëCìC€ìCíC€íCîC€îCïC€ïCðC€ðCñC€ñCòC€òCóC€óCôC€ôCõC€õCöC€öC÷C€÷CøC€øCùC€ùCúCÙ59c6be96e-38f7-11f0-2d30-a71f02755abc/97d5d32b3ca95403„¦layout„¥xaxis¥title¤text°Walks / Episodes¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥titleÙ)Empirical RMS error, averaged over states¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤data—…¤line¥colorÙ#rgba(0, 0, 255, 0.3333333333333333)¡yÈ”ï[q>GIj>bkc>º\>×þU>F£O>3§I>Q¯C>Ú=>À8>‚R2>£->žß'>„¹">Œ÷>È6>µk>õÙ>–Ž>ù1>üß>†(þ=Yö=\Øî=}Yè=¿à=HÚ=ùuÔ=vÍ=ÄfÇ=+ˆÂ=¤0¼=‡·=…¤²=ëd¬=8+¦=ã¡=™=œm˜=É¿”=úõ=;Œ=ü†=|åƒ=Ž~=üt=µr='l=Wëd=ð^=aY=ºT=PïP= VK=g/H=7>E=äŽ?=¸V==âP:=¡7=}X3=ðÑ0=Ð -=_(=Þ"=6 =ä=ó=(=%Õ=ÿ=Ó¶=æ¶=Ý=y¡=J = î=fü=ý=_O=5©=.~=¨F=Ë$=Ñà=: =é3=L=®^=ã=E–=:`=¤ç =²¿=Mš =Û«= ´=¶¶=Ñ5=¦â=ï©=¤type§scatter¤name¬TD Î± = 0.05¡xÈ€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A¨A°A¸AÀAÈAÐAØAàAèAðAøABBBBBBBB B$B(B,B0B4B8BÝ‘c>xµV>‚†J>YŠ>>«Ó2>²ð'>ÔZ>Ø¸>~;>…°> ø=\wè=ŠÚÚ=LôÍ=O7Â=t¶=½[=s¥=æå›=-Õ’=| =¸Ÿ†=í¡{=t/p=®Ïe=-æ\=+T=wS=ÛQ=uuI=$-K=®AJ=õ¨E=´A=\?=,à:=h‚B=¡¿?=¥3A=Nò?=ŽPG=È`N=Ü,K= úG=šƒF=œxH=veJ=È?G==;E=œO=OT=D S=¤ùU=)£R=ÎR=ñoP=y?T=àwT=k%Z=iW=÷"V= 7W=‰Z=OÖZ=àX=h‰V=‘qV=0X=iÖ\=?œd=!Uf=¿’f=[œd=½Bg=’$l=Boi=#>]>ððI>§ú8>C(>§>P>üáþ=4ãæ=*GÔ=zoÁ=ò´³=M¢£=Ø”=ç|‹=c=Îv=)k=› j=¯Kn=6cs=t×u=ëØl=ek=£çl=Hüh=Ñse=\h=˜j=û`i=EHp=ÃM{=N¢w=²@x=ë®u=•¹{=áv€=ÙÉ{=æ€=|4=³ü~=§z=P €=à=9ï‚=Q2‡=^ˆ=°h‰=h‹=Ô¼Œ=Bd‹=õ.‹=ªû‹=³ÛŠ=šˆ=¦g‰=îWˆ=Ìˆ=çˆ‹=5Œ=pa‹=·ü‰=abŽ=ïŒ= ŠŽ=%jŒ=Z=±’= £•=&E–=1‘=¡Ý‘=‚U“=òÅ’=¬‘=y•=Ñ•=—= ˜=9Oœ=ðš=»E™=§W™=Œ?—=2J—=$˜="=¥¸˜=<—=%?—=¬ü”=ìš=¼š=™=9!›=øƒ™= ½š=x˜=S?”=‰‘=`C‘=¤type§scatter¤nameª Î± = 0.15¡xÈ€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A¨A°A¸AÀAÈAÐAØAàAèAðAøABBBBBBBB B$B(B,B0B4B8BEºn>¥pl>ÍÈi>´èf>kÅd>"Èb>væ_>o^>=ù[>Y>„’V>£“T>HfR>CnP>"çN>þJL>‡J>1ÍG> E>³B>ÈHA>Vº?>©_>>¹L<>á·:>q9>Ø¯7> ì5>‰44>‰2>?0>‡/>p;->bÕ+>§*>â(>Š¢'>H¶%>Ž8#>n">¡ >Ý>Êø>R.>\>S}>!Í>—>;Ã>W}>¹>ôu>Ê >žŸ>ø† >Éh>*4>zd >qs >ú=>ÌC>6>¥4>^R>ûO>Ì,ÿ=}jý=½Xû= ºù=ê½ö=åó=×ò=N°ï=´óì=Q;é=dè=ô–ç=Ýãä=Iaã=Á]â=5dà=gHà=ÄmÝ=ÕlÛ=qEÚ=ØIØ=A×=TÐÕ=¬Ó=Ü4Ï=c‹Ï=ì„Ï=â¾Î=e›Ì=nqÍ=‡/Ë=ŸûÉ="É=zžÊ=ÿÌÉ=¤type§scatter¤name¬MC Î± = 0.01¡xÈ€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A¨A°A¸AÀAÈAÐAØAàAèAðAøABBBBBBBB B$B(B,B0B4B8B¹Êl>°Ãg>ïÕd>j_>»ÊZ>q`W>LS> ÀP>3M>’‡I>÷qE>‹!B>í-?>ñ3<>@8>\÷4>BÁ1>±.>Ö1*>€€'> î#>l`!>Â >?>›1>’@>$Ð>Ø'>Fv>PK>Ýó>íË >l >ƒÆ>†¬>ä¹>OÊü=Äú=oûø=Ûô=êì=:'è=Ðæ=[Àç=Zä=£ã=Vá=cÁà=e¦Þ=Ââ=–ØÛ=®›Ü=à€Ø=)>Ò=nNÔ=ypÐ=„DÎ=X¦É=&ÝÉ=ÈÃ=gÅ="QÃ=<õ¿=k½=²µ=Ñiµ=)Û²=/:=Eö¯==C³=v=¶=qQ°=¤&=d÷§=•³¥=šU¥=á–©=¨>¨=šÖ©=¢”¦=ò_£=¾m¢=ž©=ÜS¬=`l=*.=Úf®=› ¯=ñ,¶=è ¶=à²=A:³=0L¯=ó=Mçª=E©=BÔ==ª©=[É§=…ª=¤type§scatter¤nameª Î± = 0.02¡xÈ€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A¨A°A¸AÀAÈAÐAØAàAèAðAøABBBBBBBB B$B(B,B0B4B8Bk>U¦e>MCb>¬ë[>M U>`ãP>÷L>GVH>ÒC>ÓV?>Â9>Ô€2>Æ/>×ü,>õã(>ðô$>BV!>‚>Fq>hL>5¼>G>F>ð2 >ué >´ >Iš>©ÿ=˜oþ=>µ>Ìßù=$_÷=ù#ô=ýô=@¤ö=Vö=ÏÔò=ãññ=»vñ=ÁŽó=_5é=Ò¥ã=íÛ=Gà=HÅà=L„Ú=)áÛ=ØÎÕ=‰ªÓ=9Õ=:¿Ò=ýºÖ=úæÓ=ˆfÜ='Þ=Ds×=uoÙ=}Ø=I¼Ñ=<ÝÖ=ÄÛÕ=Ÿ×=’ÓÕ=ŠæÎ=ÎãÌ=ËÅÒ=÷þÊ=[BÍ=ÂÏÌ=>DÍ=Ê=ûÞÇ=ÿEÎ=Ò|Ê=r:Ê=«>Ç= Ä=Çl¿=ãê¾=2 Æ=/Ì=£jÈ=ƒfÌ=½=[WÄ=¸«Á=Ç$È=w¤»= »=™ê¶=Ùù¹=w©º=.Z¿=ÞX½=/îÂ=2<º=©¸¿=xéÄ=·`Â=¤type§scatter¤nameª Î± = 0.03¡xÈ€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A¨A°A¸AÀAÈAÐAØAàAèAðAøABBBBBBBB B$B(B,B0B4B8B¯Li>Œlc>'?]>ëW>@–S>læM>…~F>LxC>ß<>¯Á9>rÈ4>Ã2>ºW2>7/>:%+>p/(>ç}!>’ÿ>ëµ>ï˜>F¦>Þ>5Ä >øo >¥’>Š} >:O >dq>?Î>Vä>L1ö=*Íé= å=Ð ê=€‘ë=Ýã=ïã=œå=Ùé=íð=’¥ñ=ìê=Ué=ýOé=9ìè=»Ðæ=æQâ=4$ß=‰è={î=µì= Šê=¥à=Èà=;çØ=Ë°Î=‹´à=¿™Þ=ß=©²ã=P¾á=À§ä=ºÿÝ=qãÞ=Àˆà=ÔWÞ=ÞÕ=åÒ=ïcÒ=í×Ï= éÐ=mJ×=®ñÚ=l†Ô=£{Ò=µäÍ=®ûÍ=ÁÌÏ=“•Ù=¸‚Ú=°…â=]–ã=ÞÚ=ã@Þ=»æ=Žä=¼ ß=‡ƒÚ=xDÛ=ê<Ú=ýâá=„žç=4Œì=3Èñ=Wì=*Bî= í=zÙß=‡(Ü=`›Ú=¤type§scatter¤nameª Î± = 0.04¡xÈ€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A¨A°A¸AÀAÈAÐAØAàAèAðAøABBBBBBBB B$B(B,B0B4B8B¨ticktext–®leaving office©reach car¯exiting highway«2ndary road«home street«arrive home¦anchor¢y1¦yaxis1ƒ¥title¤textÙ Predicted total
travel time¦domain×€?¦anchor¢x1¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦xaxis2…¨tickvals–§leaving©reach_car¬exit_highway¦snd_rd§home_st¦arrive¥title¤text¥State¦domain×ÍÌ?€?¨ticktext–®leaving office©reach car¯exiting highway«2ndary road«home street«arrive home¦anchor¢y2¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¦yaxis2‚¦domain×€?¦anchor¢x2¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤dataÜ‡¤line¥color¥black¥xaxis¢x1¡yÇðABBBBB¤type§scatter¤name®actual outcome¥yaxis¢y1¡x–§leaving©reach_car¬exit_highway¦snd_rd§home_st¦arriveˆ¤mode¥lines¤line‚¥color¥black¤dash¤dash¥xaxis¢x1¡yÇBBBBBB¤type§scatter¤name¶Monte Carlo Prediction¥yaxis¢y1¡x–§leaving©reach_car¬exit_highway¦snd_rd§home_st¦arrive‰ªshowlegendÂ¡x’§leaving§leaving¤line¥color£red¥xaxis¢x1¡y×ðAB¤type§scatter¤name°Mone Carlo Error¥yaxis¢y1¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’©reach_car©reach_car¤line¥color£red¥xaxis¢x1¡y×BB¤type§scatter¤name°Mone Carlo Error¥yaxis¢y1¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’¬exit_highway¬exit_highway¤line¥color£red¥xaxis¢x1¡y×BB¤type§scatter¤name°Mone Carlo Error¥yaxis¢y1¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’¦snd_rd¦snd_rd¤line¥color£red¥xaxis¢x1¡y×BB¤type§scatter¤name°Mone Carlo Error¥yaxis¢y1¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’§home_st§home_st¤line¥color£red¥xaxis¢x1¡y×BB¤type§scatter¤name°Mone Carlo Error¥yaxis¢y1¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’¦arrive¦arrive¤line¥color£red¥xaxis¢x1¡y×BB¤type§scatter¤name°Mone Carlo Error¥yaxis¢y1¦marker‚¦symbol¬arrow-bar-up¨angleref¨previousˆªshowlegendÂ¤line¥color¥black¥xaxis¢x2¡yÇðABBBBB¤type§scatter¤name®actual outcome¥yaxis¢y2¡x–§leaving©reach_car¬exit_highway¦snd_rd§home_st¦arriveˆ¤mode¥lines¤lineƒ¥color¥black¤dash¤dash¥shape¢hv¥xaxis¢x2¡yÇBBBBBB¤type§scatter¤name°TD(0) Prediction¥yaxis¢y2¡x–§leaving©reach_car¬exit_highway¦snd_rd§home_st¦arrive‰ªshowlegendÂ¡x’§leaving§leaving¤line¥color£red¥xaxis¢x2¡y×ðAB¤type§scatter¤name«TD(0) Error¥yaxis¢y2¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’©reach_car©reach_car¤line¥color£red¥xaxis¢x2¡y×BB¤type§scatter¤name«TD(0) Error¥yaxis¢y2¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’¬exit_highway¬exit_highway¤line¥color£red¥xaxis¢x2¡y×BB¤type§scatter¤name«TD(0) Error¥yaxis¢y2¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’¦snd_rd¦snd_rd¤line¥color£red¥xaxis¢x2¡y×BB¤type§scatter¤name«TD(0) Error¥yaxis¢y2¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’§home_st§home_st¤line¥color£red¥xaxis¢x2¡y×BB¤type§scatter¤name«TD(0) Error¥yaxis¢y2¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’¦arrive¦arrive¤line¥color£red¥xaxis¢x2¡y×BB¤type§scatter¤name«TD(0) Error¥yaxis¢y2¦marker‚¦symbol¬arrow-bar-up¨angleref¨previousÙ59c6be96e-38f7-11f0-2d30-a71f02755abc/56740ad756b57fb4„¦layout†¦xaxis1…¨tickvals–§leaving©reach_car¬exit_highway¦snd_rd§home_st¦arrive¥title¤text¥State¦domain×ffæ>¨ticktext–®leaving office©reach car¯exiting highway«2ndary road«home street«arrive home¦anchor¢y1¦yaxis1ƒ¥title¤textÙ Predicted total
travel time¦domain×€?¦anchor¢x1¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦xaxis2…¨tickvals–§leaving©reach_car¬exit_highway¦snd_rd§home_st¦arrive¥title¤text¥State¦domain×ÍÌ?€?¨ticktext–®leaving office©reach car¯exiting highway«2ndary road«home street«arrive home¦anchor¢y2¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¦yaxis2‚¦domain×€?¦anchor¢x2¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤dataÜ‡¤line¥color¥black¥xaxis¢x1¡yÇðAðAðAðAðAB¤type§scatter¤name®actual outcome¥yaxis¢y1¡x–§leaving©reach_car¬exit_highway¦snd_rd§home_st¦arriveˆ¤mode¥lines¤line‚¥color¥black¤dash¤dash¥xaxis¢x1¡yÇBBBBBB¤type§scatter¤name¶Monte Carlo Prediction¥yaxis¢y1¡x–§leaving©reach_car¬exit_highway¦snd_rd§home_st¦arrive‰ªshowlegendÂ¡x’§leaving§leaving¤line¥color£red¥xaxis¢x1¡y×ðAB¤type§scatter¤name°Mone Carlo Error¥yaxis¢y1¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’©reach_car©reach_car¤line¥color£red¥xaxis¢x1¡y×ðAB¤type§scatter¤name°Mone Carlo Error¥yaxis¢y1¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’¬exit_highway¬exit_highway¤line¥color£red¥xaxis¢x1¡y×ðAB¤type§scatter¤name°Mone Carlo Error¥yaxis¢y1¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’¦snd_rd¦snd_rd¤line¥color£red¥xaxis¢x1¡y×ðAB¤type§scatter¤name°Mone Carlo Error¥yaxis¢y1¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’§home_st§home_st¤line¥color£red¥xaxis¢x1¡y×ðAB¤type§scatter¤name°Mone Carlo Error¥yaxis¢y1¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’¦arrive¦arrive¤line¥color£red¥xaxis¢x1¡y×BB¤type§scatter¤name°Mone Carlo Error¥yaxis¢y1¦marker‚¦symbol¬arrow-bar-up¨angleref¨previousˆªshowlegendÂ¤line¥color¥black¥xaxis¢x2¡yÇðAðAðAðAðAB¤type§scatter¤name®actual outcome¥yaxis¢y2¡x–§leaving©reach_car¬exit_highway¦snd_rd§home_st¦arriveˆ¤mode¥lines¤lineƒ¥color¥black¤dash¤dash¥shape¢hv¥xaxis¢x2¡yÇðAðAðAðABB¤type§scatter¤name°TD(0) Prediction¥yaxis¢y2¡x–§leaving©reach_car¬exit_highway¦snd_rd§home_st¦arrive‰ªshowlegendÂ¡x’§leaving§leaving¤line¥color£red¥xaxis¢x2¡y×ðAðA¤type§scatter¤name«TD(0) Error¥yaxis¢y2¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’©reach_car©reach_car¤line¥color£red¥xaxis¢x2¡y×ðAðA¤type§scatter¤name«TD(0) Error¥yaxis¢y2¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’¬exit_highway¬exit_highway¤line¥color£red¥xaxis¢x2¡y×ðAðA¤type§scatter¤name«TD(0) Error¥yaxis¢y2¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’¦snd_rd¦snd_rd¤line¥color£red¥xaxis¢x2¡y×ðAðA¤type§scatter¤name«TD(0) Error¥yaxis¢y2¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’§home_st§home_st¤line¥color£red¥xaxis¢x2¡y×ðAB¤type§scatter¤name«TD(0) Error¥yaxis¢y2¦marker‚¦symbol¬arrow-bar-up¨angleref¨previous‰ªshowlegendÂ¡x’¦arrive¦arrive¤line¥color£red¥xaxis¢x2¡y×BB¤type§scatter¤name«TD(0) Error¥yaxis¢y2¦marker‚¦symbol¬arrow-bar-up¨angleref¨previousÙ59c6be96e-38f7-11f0-2d30-a71f02755abc/2933a969c3841bd1„¦layoutŠ¨autosizeÂ§paddingÊ¥xaxis‹¨showlineÃ©gridcolor¥black¨tickvalsÇ(€?@@@€@ @À@à@AA A¥range×€?0A¨ticktextÇ(€?€?€?@@€?©linecolor¥black¨showgridÃ¨gridwithÊ?€¨zerolineÃ¥title«Wind Values¦mirrorÃpaper_bgcolor°rgba(0, 0, 0, 0)¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦heightÊCR¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥yaxis‰¨showgridÃ¨showlineÃ©gridcolor¥black©gridwidthÊ?€¨tickvalsÇ€?@@@€@ @À@à@¥range×€?A¦mirrorÃ¨ticktext—       ©linecolor¥black¥titleƒ¤font¤sizeÊA`¤textÙ(Value Iteration Policy
Path Example¡xÊ?¥widthÊC–¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤data™‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖ@¤type§scatter¤text‘¡S¡xÖÀ?‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖ@¤type§scatter¤text‘¡G¡xÖA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@`@¤type§scatter¤name¬Optimal Path¡x×À? @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@@¤type§scatter¤name¬Optimal Path¡x× @`@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@`@¤type§scatter¤name¬Optimal Path¡x×`@@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x×@°@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x×°@Ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x×Ð@ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@@¤type§scatter¤name¬Optimal Path¡x×ð@AÙ59c6be96e-38f7-11f0-2d30-a71f02755abc/7b6adbf2145966c9„¦layoutŠ¨autosizeÂ§paddingÊ¥xaxis‹¨showlineÃ©gridcolor¥black¨tickvalsÇ(€?@@@€@ @À@à@AA A¥range×€?0A¨ticktextÇ(€?€?€?@@€?©linecolor¥black¨showgridÃ¨gridwithÊ?€¨zerolineÃ¥title«Wind Values¦mirrorÃpaper_bgcolor°rgba(0, 0, 0, 0)¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦heightÊCR¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥yaxis‰¨showgridÃ¨showlineÃ©gridcolor¥black©gridwidthÊ?€¨tickvalsÇ€?@@@€@ @À@à@¥range×€?A¦mirrorÃ¨ticktext—       ©linecolor¥black¥titleƒ¤font¤sizeÊA`¤text¾Sarsa policy
Path Example¡xÊ?¥widthÊC–¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤dataÜ ‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖ@¤type§scatter¤text‘¡S¡xÖÀ?‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖ@¤type§scatter¤text‘¡G¡xÖA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@`@¤type§scatter¤name¬Optimal Path¡x×À? @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x× @`@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@ @¤type§scatter¤name¬Optimal Path¡x×`@`@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x×`@ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @`@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×`@@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×@°@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×°@Ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x×Ð@ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @Ð@¤type§scatter¤name¬Optimal Path¡x×ð@Ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×Ð@Ð@¤type§scatter¤name¬Optimal Path¡x×Ð@ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×Ð@Ð@¤type§scatter¤name¬Optimal Path¡x×ð@ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×Ð@ð@¤type§scatter¤name¬Optimal Path¡x×ð@ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×ð@ð@¤type§scatter¤name¬Optimal Path¡x×ð@A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×ð@ð@¤type§scatter¤name¬Optimal Path¡x×AA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×ð@ð@¤type§scatter¤name¬Optimal Path¡x×A(A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×ð@Ð@¤type§scatter¤name¬Optimal Path¡x×(A(A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×Ð@°@¤type§scatter¤name¬Optimal Path¡x×(A(A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×°@@¤type§scatter¤name¬Optimal Path¡x×(A(A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@`@¤type§scatter¤name¬Optimal Path¡x×(AA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@ @¤type§scatter¤name¬Optimal Path¡x×AA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @`@¤type§scatter¤name¬Optimal Path¡x×AA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@°@¤type§scatter¤name¬Optimal Path¡x×AA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×°@Ð@¤type§scatter¤name¬Optimal Path¡x×A(A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×Ð@°@¤type§scatter¤name¬Optimal Path¡x×(A(A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×°@@¤type§scatter¤name¬Optimal Path¡x×(A(A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@`@¤type§scatter¤name¬Optimal Path¡x×(AA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x×AA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x×AA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@@¤type§scatter¤name¬Optimal Path¡x×AAÙ59c6be96e-38f7-11f0-2d30-a71f02755abc/4d752609bc5b03a9„¦layout„¥xaxis¥title¤text¨Episodes¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥yaxis‚¥titleÙ.Average steps per episode
during training¥range×ÈB¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤data’„¡yÈÐ®gDn5C¸žÚB=Š°BÂ´BÂ±Bö¨¥B)ÜB…ˆBÍLBš™tBq=rB= xB33WBHáVB= ^B…ëZBö(4BHá:B\:B…,Bq=2B®G7B®G%B¸#Báz#BÂ#BffBHáBffBázBš™7B…BR¸úA{B)\ñAÃõüA…åA®ùAš™ùA\ôAš™çAÃõàAffÔAÂ×A)\åAq=àAffÚA®ÕA)\ÝAR¸ØAHáÐAHáÞAq=ÂAázÀAÂ½AHáB33ÙA…ëÁAR¸ÈA)\áA= BffÎA ×åA)\ñA¤pÍA®ÝA®ÃA®ÉAÍÌÊAÍÌÊA= ¿AìQÈA®GßA{¼A\ÎAff¼AHáÒAHáÄA×£ìA¤põA33¹A®GÁAq=ÈA ×±AázªA…µA\°A= µA\ÎA…µAö(²A…¯Aö(¸Aq=¬AÎA×£¸A= ³A…ë»A= «A)\¯AªAÃõÀAffºA= ÙAÍÌÔA…ËA®GBÂÏA33ÝAq=úAÍÌÜAázÊA ×ÅA…ÇA{ÆA¤p»A¤pÑA¸»AÍÌ¸AázÔAR¸ºA)\»A{¼AÍÌ¸A)\·A¤pÓAö(ÌAff¼A¤pÁAÍÌ¸Aáz¼AHá°AR¸²A{ºA…½AR¸¾Aq=°Aö(¬AÂ©A×£°AÍÌ®Aq=°A×£¼A\¼AÍÌÆA®ËA\¶A\¼A¸Aq=¬A33ÕAÂÏA{ÂA…ë·A…ÏA= ÕA= ¹A= óA ×³A®G½A®G¹AªAö(¬Aff¨AR¸´A®G«AºAR¸ÀA33«AHáºA…ßAff¸A{ÐAÂ³AÂ»A®§AÍÌæAš™ÁAHá²AÂ³AºA×£°A)\µA= «A\¦AR¸ºA…»A)\·A{ÄA¤pãAHáòA ×ÍAáz®A)\¯AÃõ°AR¸´A¤p«A)\µA ×©A)\¯AHá°AR¸¤A)\©A®G©AHá¤A33AR¸¤A…±Aö(¨A\¢A…§A\²Aš™A\¬AffªAÃõ¦A…©A= ½A{ºA)\¯A…ÏA…ë«AffªAš™³A)\µA®G·AÍÌ¶A ×A×£´A…ë¯A®G«Aáz²AìQ¬A…¯Aö(°Aq=B33¿Aq=´A®G¥A ×§A)\A\¨A{¶A¤p«Aš™£A×£°A¤pµAR¸¢A®G¯A33£AìQªA¢A33ÍAÍÌ¨AÂÃAö(ªA{¨A¸©A)\³AffªAÂ¥Aš™«Aq=®AHá®A¬Aq=¬A33«AÍÌªA33±A×£ªAff¨A= A®G¯A®«AÂ¥Aö(¦AÃõ¢A…£A…»AÍÌ°A= ÅAR¸®A33Aš™A®GAq=¬A¸ÝA®G«A= µA¤p«A= »AÃõ¦Aáz²A¸©A¤p±A33«Aö(ªAq=¦A{¨Aff¬AÍÌÊAÂ¿AázâA ×±A…µAš™³A\²A…ë¯Aq=¨Aö(¶A= ±A ×«A)\AÃõ°A¸·AÃõ¨AÃõ®A×£¢A)\¥Aáz¦Aš™AÂ«A×£ªA\´Aš™§AÍÌ¢A…ë©AÃõ¤Aö(¤AÃõ Aff¢Aáz¦AÂ¡AHá´A×£¦A)\³A ×±AHá°A ×A…ÇA®«A…«A®GÅAáz BÂ½A…ë·Aq=¶A®±A ×«AìQ®AÂ³A…ë©AR¸¦A®G«A{°AR¸¦Aff¦AÍÌ¨A…©A¤pAÃõ¨Aö(¦A®¡Aff¤A{¤AìQžA¾A{ÊA= £AÂ§A…ë¥A= ©A33ŸAHá A= ÁAff¨A…ë¥AÂ¡AìQ°AÍÌªA\¤AR¸¶A33A)\«A×£¬A°A¬AìQ¨A¸©Aö(¬A…ë±A= ©Aš™µA= ¥A\´AázÈAºA ×ÇA×£ªAR¸¾A×£¸Aq=®A= §A33§Aö(¬Aö(°A®«AÈA)\A¸©Aff²AHá´A¤pA ×©A{¨AìQÌAÃõ¼AÂ·Aq=ÀAö(ªA…¯A33©A²A{´A®ÃAff¨A×£¦Aš™§Aö(ÐA{¦A…±AÍÌªA…§Aff¦A¤p«A²A{žA…¡Aq=¢A…ë¹A= §A33½Aff´A= A{¨Aö(¶A×£¬A…ÃAff¸A®½A)\£A33µAR¸¬A\ªAR¸¬A…ë«Aáz¨AìQ¬Aq=ªA¸«AÂÓAö(²A\ÂAìQ¶AR¸¨AÂAHáºAš™¯A…±A\®A®G¥A33©A= ¹AÂA¸±A®»A®G©AìQÎA ×ÁAö(ÄA)\¯A= §A…AffºA)\§A ×±A×£ªA®±Aš™©AÍÌ´A…ë±Aq=¬A33½AªA…«A{²AìQªAÍÌ¢Aáz´A¤type§scatter¤name¥Sarsa¡xÈÐ€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A¨A°A¸AÀAÈAÐAØAàAèAðAøABBBBBBBB B$B(B,B0B4B8BC?C@CACBCCCDCECFCGCHCICJCKCLCMCNCOCPCQCRCSCTCUCVCWCXCYCZC[C\C]C^C_C`CaCbCcCdCeCfCgChCiCjCkClCmCnCoCpCqCrCsCtCuCvCwCxCyCzC{C|C}C~CC€C€€CC€C‚C€‚CƒC€ƒC„C€„C…C€…C†C€†C‡C€‡CˆC€ˆC‰C€‰CŠC€ŠC‹C€‹CŒC€ŒCC€CŽC€ŽCC€CC€C‘C€‘C’C€’C“C€“C”C€”C•C€•C–C€–C—C€—C˜C€˜C™C€™CšC€šC›C€›CœC€œCC€CžC€žCŸC€ŸC C€ C¡C€¡C¢C€¢C£C€£C¤C€¤C¥C€¥C¦C€¦C§C€§C¨C€¨C©C€©CªC€ªC«C€«C¬C€¬CC€C®C€®C¯C€¯C°C€°C±C€±C²C€²C³C€³C´C€´CµC€µC¶C€¶C·C€·C¸C€¸C¹C€¹CºC€ºC»C€»C¼C€¼C½C€½C¾C€¾C¿C€¿CÀC€ÀCÁC€ÁCÂC€ÂCÃC€ÃCÄC€ÄCÅC€ÅCÆC€ÆCÇC€ÇCÈC€ÈCÉC€ÉCÊC€ÊCËC€ËCÌC€ÌCÍC€ÍCÎC€ÎCÏC€ÏCÐC€ÐCÑC€ÑCÒC€ÒCÓC€ÓCÔC€ÔCÕC€ÕCÖC€ÖC×C€×CØC€ØCÙC€ÙCÚC€ÚCÛC€ÛCÜC€ÜCÝC€ÝCÞC€ÞCßC€ßCàC€àCáC€áCâC€âCãC€ãCäC€äCåC€åCæC€æCçC€çCèC€èCéC€éCêC€êCëC€ëCìC€ìCíC€íCîC€îCïC€ïCðC€ðCñC€ñCòC€òCóC€óCôC€ôCõC€õCöC€öC÷C€÷CøC€øCùC€ùCúC„¡yÈÐ¸øC‚#C WÙB»Bö¨–BáúšBÍÌœBfæB…kŠBR¸vBìQqB{VBÂ[BR¸FBÂHB\AB= 2BR¸/Bö(*Bq=Bq=Bö(B®BB¤p B…ëB)\Bš™ûA ×BffìA\öAázêA¤pçA…çAš™ÙA{ÞA= ÝA×£ÐAq=ÜAff¼AìQÊAR¸ÈA= ¿AffÀAHá´A…ëµA¸³AÃõ¶A¤p«A×£¢Aö(¢A{¦A…ëŸA)\—AHá A…AÃõŠA…ëAq=AR¸ŠAö(ˆAffŽAffŽA¤p‰Aq=ˆAq=ŽA{ŠA= A= ‰AÂ‰AHáŠA\†AR¸ˆA)\ƒAÂ‹Aš™ƒA¸…AìQŒAffŒA…ë‹Aff†A ×A33‘A{–A ×‰Aš™AÃõˆAŒA¤pA¸‰AˆAq=ŠAìQˆA…ë…A¤p‡A33‰Aö(†Aš™ƒA)\Aš™A\ŠAff†A33‹A…‹A¸…AìQ–AHá†A…—AÃõŠA…AìQ†AffŠA\„AŠA33‡AR¸ŠAÃõ„A¸A…AR¸ŒA= ‡A…ëyA33A…ëƒAÍÌŒAÃõ„AÃõ†Aš™‡A\†A¤p“Aš™A ×‹A{ŽAÃõ†AÍÌŠAö(ŽAš™…A®G…AffŽAffŠAÍÌ„AR¸A{†A¤pAÂA×£ŽAÍÌ†A33…AìQ„A33A®‰AÂ‡A{†AHáˆAÂ…AÃõˆA33A…ëA¤pƒAff„A= ‰A×£ˆAR¸ŒAq=ŒAR¸†A\†Aö(ˆA®‰AÂ‡A ×‘AÃõˆA×£ˆAŒA®G‡AˆAq=†AŒAìQŒA…‡A®G…A= A¸…AffAö(„A\†AffˆAìQŒA\ŠA)\…A{ŒAHá€A\„AHáŽA…A= …A®AÂ‰AÍÌ€AázŠA®A…AÂ…A33ƒA ×‰AìQŒAq=AÃõxA)\…Aš™A33‹AffˆA)\Aö(”AR¸„AHá„AHáA)\AÂ‰Aq=ŠA ×ƒA ×ƒAR¸ˆAA ×‰AR¸†AHá†Aq=ŒA…ë…A33ƒA{ˆA…ë‡Aö(†A®G‰AHá„A®‹A\ŠA)\A¤p‡A{ŠAq=†AÃõ†A¸‡A¸…A)\‹A®G‰AŠAÂA ×{AR¸ŒA)\‡A{†AÍÌ‚AŒAš™}A33‘Aq=†AÃõ†A®G‹A33A)\wA ×…A33AÃõ†AÍÌŽA ×‰AÃõ†A= ‰A ×AR¸†A…‡Aq=ŒA…ë‰A¸…AÂA= ‰AÂ‹A¸‰A…‡AR¸ŠA)\‡AìQ‚AR¸ŠA…‹A\ŒAHá†A…A…ƒA®G‘A®G‰AffˆA{ˆAÃõ‚Aáz’A= ‰A…ƒA®G…A×£€AìQŽA)\ƒAq=ˆA ×…A®…A†A×£ŠAŽA¸‡AHáŠA¤p‹AffŠAö(ŽAffŠA¤pA®G…A{ŽA®…AÃõ†AìQ‚A{‚A×£†A®G‡AÃõŒAq=ŽAÃõ†A)\‘A®G‰Aö(„A= ‰A ×A¸Aš™‰AÃõ†A= A¤pAŠAq=ŽAáz†A)\‡AffŒA¤p‡Aff†Aš™A……A…ë‡Aö(ŠA…AìQˆA\„Aq=†A®G‡Aö(ŒAáz†A®G‹AR¸€A”A)\ƒA…‰A…ëAq=ŠA×£|A33‡A¤p…AìQAŒAÃõŒA)\Aáz„A33…AázˆA ×…Aö(†A ×‰A ×‡A…ëƒA†A)\‘Aö(ŠA…Aq=„Aš™‰AHá’AffˆA€A= ‡AÃõŒAÃõ†A®G‰A¤p‰A= …A\ˆA ×‰A®G‘AHá€AHá†A…‰A¤p…A ×ƒAff†Aáz„A{zAq=ŽA×£ŽA= …A\‚AffŒAš™A= ‹A®G‰Aš™‡Aff†AÃõˆAÂ‹Aš™‰A33‘A33A\„A®ƒA…ë…AR¸ŽAš™AÃõ„AázŽAq=„A…ë‡A……A\ˆA¤p…A…‡AÂ}AHá~A…ë‡AR¸†A\„AÍÌ„AHá†Aö(ŠAq=~A×£ˆAq=„Aö(†A‚AffŠA)\A{ŽA= A¤pA¸…A×£„AR¸~AÃõ†Aq=A…A{ˆA×£„A¸A= …A×£ŠAq=‚Aq=‚Aš™‰AÃõŒA×£”A®G}AÂƒA®G‹A)\‹A ×ƒA…ë‰AÂ‰AR¸‚A¤pA ×‡A®‡A…ë‰A®‡AHá†A\ˆAÍÌ€A¸Aš™‘A33‹A¤p…Aö(„A…ë•AìQ„AázAázŒAÃõŒAHá‚A¤pAš™…A®A×£†A¸A)\ƒAR¸„A= ƒA33‡AŒA……A= A¤type§scatter¤nameªQ-learning¡xÈÐ€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A¨A°A¸AÀAÈAÐAØAàAèAðAøABBBBBBBB B$B(B,B0B4B8BC?C@CACBCCCDCECFCGCHCICJCKCLCMCNCOCPCQCRCSCTCUCVCWCXCYCZC[C\C]C^C_C`CaCbCcCdCeCfCgChCiCjCkClCmCnCoCpCqCrCsCtCuCvCwCxCyCzC{C|C}C~CC€C€€CC€C‚C€‚CƒC€ƒC„C€„C…C€…C†C€†C‡C€‡CˆC€ˆC‰C€‰CŠC€ŠC‹C€‹CŒC€ŒCC€CŽC€ŽCC€CC€C‘C€‘C’C€’C“C€“C”C€”C•C€•C–C€–C—C€—C˜C€˜C™C€™CšC€šC›C€›CœC€œCC€CžC€žCŸC€ŸC C€ C¡C€¡C¢C€¢C£C€£C¤C€¤C¥C€¥C¦C€¦C§C€§C¨C€¨C©C€©CªC€ªC«C€«C¬C€¬CC€C®C€®C¯C€¯C°C€°C±C€±C²C€²C³C€³C´C€´CµC€µC¶C€¶C·C€·C¸C€¸C¹C€¹CºC€ºC»C€»C¼C€¼C½C€½C¾C€¾C¿C€¿CÀC€ÀCÁC€ÁCÂC€ÂCÃC€ÃCÄC€ÄCÅC€ÅCÆC€ÆCÇC€ÇCÈC€ÈCÉC€ÉCÊC€ÊCËC€ËCÌC€ÌCÍC€ÍCÎC€ÎCÏC€ÏCÐC€ÐCÑC€ÑCÒC€ÒCÓC€ÓCÔC€ÔCÕC€ÕCÖC€ÖC×C€×CØC€ØCÙC€ÙCÚC€ÚCÛC€ÛCÜC€ÜCÝC€ÝCÞC€ÞCßC€ßCàC€àCáC€áCâC€âCãC€ãCäC€äCåC€åCæC€æCçC€çCèC€èCéC€éCêC€êCëC€ëCìC€ìCíC€íCîC€îCïC€ïCðC€ðCñC€ñCòC€òCóC€óCôC€ôCõC€õCöC€öC÷C€÷CøC€øCùC€ùCúCÙ59c6be96e-38f7-11f0-2d30-a71f02755abc/90f5c347caa747c8„¦layoutŠ¨autosizeÂ§paddingÊ¥xaxis‹¨showlineÃ©gridcolor¥black¨tickvalsÇ(€?@@@€@ @À@à@AA A¥range×€?0A¨ticktextÇ(€?€?€?@@€?©linecolor¥black¨showgridÃ¨gridwithÊ?€¨zerolineÃ¥title«Wind Values¦mirrorÃpaper_bgcolor°rgba(0, 0, 0, 0)¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦heightÊCR¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥yaxis‰¨showgridÃ¨showlineÃ©gridcolor¥black©gridwidthÊ?€¨tickvalsÇ€?@@@€@ @À@à@¥range×€?A¦mirrorÃ¨ticktext—       ©linecolor¥black¥titleƒ¤font¤sizeÊA`¤text¾Sarsa policy
Path Example¡xÊ?¥widthÊC–¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤dataÜ‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖ@¤type§scatter¤text‘¡S¡xÖÀ?‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖ@¤type§scatter¤text‘¡G¡xÖA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@@¤type§scatter¤name¬Optimal Path¡x×À? @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@@¤type§scatter¤name¬Optimal Path¡x× @`@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@`@¤type§scatter¤name¬Optimal Path¡x×`@`@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x×`@@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@@¤type§scatter¤name¬Optimal Path¡x×@°@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@°@¤type§scatter¤name¬Optimal Path¡x×°@Ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×°@Ð@¤type§scatter¤name¬Optimal Path¡x×Ð@ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×Ð@ð@¤type§scatter¤name¬Optimal Path¡x×ð@Ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×ð@ð@¤type§scatter¤name¬Optimal Path¡x×Ð@ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×ð@ð@¤type§scatter¤name¬Optimal Path¡x×ð@A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×ð@ð@¤type§scatter¤name¬Optimal Path¡x×AA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×ð@ð@¤type§scatter¤name¬Optimal Path¡x×A(A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×ð@Ð@¤type§scatter¤name¬Optimal Path¡x×(A(A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×Ð@°@¤type§scatter¤name¬Optimal Path¡x×(A(A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×°@@¤type§scatter¤name¬Optimal Path¡x×(A(A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@`@¤type§scatter¤name¬Optimal Path¡x×(A(A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x×(AA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@@¤type§scatter¤name¬Optimal Path¡x×AAÙ59c6be96e-38f7-11f0-2d30-a71f02755abc/d2eeaee44f48b8a0„¦layout„¥xaxis¥title¤textªTime steps¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥yaxis‚¤type£log¥title¤text±Steps Per Episode¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤data‘ƒ¡yÈ¨€‚D@D°B BCÊBøACBøA¾B,CÈAðAÈA A CCBlB–BB B†B(BˆBHBšBèAÈAäBÆBBXB–BpBB¤BŒBHBB¤BèAâBÈABtBÈAXBXBÂB‚B BlBàAB4B„B8BLBèATB@BàA(BøABB¸A,BðABDBlBàAHB ABàAÐA AÐABˆAàABÀAÐA AÐAœB¸AÀA¨AB8BBàABBB ABàAB¨AB°A$B°AÐA¸A€AàAÀAÈABøA¨AÐA¨AA¨A°AÐA¨AÀA°ABpA AA A AˆAA€AÀA€AˆAÈAÐA A˜A€A4B€A°AÀAÀAÀAˆABÐA A¸A¨AˆA AøAB AAÐA°A AA¸A¨AˆA¤type§scatter¡xÈ¨€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A¨A°A¸AÀAÈAÐAØAàAèAðAøABBBBBBBB B$B(B,B0B4B8BcŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥yaxis¥title¤text¨Episodes¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤data‘„¤line¥color£red¡yÈ@€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A¨A°A¸AÀAÈAÐAØAàAèAðAøABBBBBBBB B$B(B,B0B4B8BC?C@CACBCCCDCECFCGCHCICJCKCLCMCNCOCPCQCRCSCTCUCVCWCXCYCZC[C\C]C^C_C`CaCbCcCdCeCfCgChCiCjCkClCmCnCoCpCqCrCsCtCuCvCwCxCyCzC{C|C}C~CC€C€€CC€C‚C€‚CƒC€ƒC„C€„C…C€…C†C€†C‡C€‡CˆC€ˆC‰C€‰CŠC€ŠC‹C€‹CŒC€ŒCC€CŽC€ŽCC€CC€C‘C€‘C’C€’C“C€“C”C€”C•C€•C–C€–C—C€—C˜C€˜C™C€™CšC€šC›C€›CœC€œCC€CžC€žCŸC€ŸC C€ C¡C€¡C¢C€¢C£C€£C¤C€¤C¥C€¥C¦C€¦C§C€§C¨C€¨C©C€©CªC€ªC«C€«C¬C€¬CC€C®C€®C¯C€¯C°C€°C±C€±C²C€²C³C€³C´C€´CµC€µC¶C€¶C·C€·C¸C€¸C¹C€¹CºC€ºC»C€»C¼C€¼C½C€½C¾C€¾C¿C€¿CÀC€ÀCÁC€ÁCÂC€ÂCÃC€ÃCÄC€ÄCÅC€ÅCÆC€ÆCÇC€ÇCÈC¤type§scatter¡xÈ@€{D`¶D€àD èD`E@EE E3Eð7EÀ?E0EE QEÀTEZE`\E ^EqEà{E0EàƒEh…Eh†EØ†E ‹Ep‘E`’E@”E–E8šExEð©Eh¬E˜¯Eø¯E¸¸EPËE ÏEhÓEÔEPÕEØE°ÙEpÚEHâE€ãEˆäE@çE8éEˆìEHíEðïEñEhñE8÷EÐ÷E°øEØúEüE(ýEpþE0ÿETFDF@FäF\F°F€FìFLF@F<FÀF,FlFäFüFxFèFÐF|FôFüF´FÔFüF¼FLF8F„ F8"FT$F%Fð(FÌ)F¬*F0+FÄ+FH,F$0F”1FØ2FÈ3Fè3F@4Fh4Fà4F5Fl5FÀ5Fè5F6Fˆ6F¸6Fð6F\8F¤8Fð8F9Fˆ9F,:F\:FÈ:FD;FDFH>Ft>FÈ>F@@FXAFhCF(DFLDFÔGFHF JFKF8KFDMFdMF\QFìQF¼TF|UF@XF€[F\Fà\F_FaF¬aF$bFpbFdFÔdFfF¼hFÜhF¼iFÄkF mFxmFnFxnFhoFxpFdrFÄrFàrF0tFdtFDuF¨uFLwFÌwFˆF€ˆF¸ˆFj‰FtŠF®ŠFÆ‹FŒFàŒF&FÂF‚ŽFXFª‘F>’FÆ’F:“F¸•F¼—FH˜F:šF†›F¸œFFêFâžFÜŸF: FÀ£Fò¤F”¥F>¦F†¦FØ§F©F:©F¬©Fô©F,ªFtªF|FT®Fb®FÄ¯FÐ°F¾²F<´FŒµFÀµFÞ¶F@·FÜ·Fü·FÒ¸Fb¹F†¹Fø¹F0ºFÄºF|»FÊ»F|¼F½F¾Fþ¾F(ÀFnÁFžÁFpÂFÆÂFôÂFbÃF®ÃF6ÄFxÄF˜ÄFÂÄFòÄFÅFÅFJÅFæÅF6ÆFÎÆF$ÇFRÇFšÇFÎÇF ÈF\ÈF„ÈFÉF^ÉF‚ÉF.ÊFTÊF†ÊF´ÊFâÊF:ËFXËF|ËFúËF&ÌFÚÌFÍF˜ÍFÎF‚ÎF’ÎFÞÎF>ÏF†ÏFâÏFZÐF ÐFâÐFêÑFTÒF¤ÒFöÒFpÓFðÓFÕFÎÕF¬ÖF×F<×F¢×F2ØFNØFàØFÂÙFÚF|ÚF.ÛFdÛF¬ÛFÜF”ÜF¬ÜFÒÜFÝFHÝF–ÝFÂÝFÞF.ÞFVÞF„ÞFÊÞF<ßFâßFVàFŠàFöàF:áFráFèáF|âFÆâFöâF6ãF‚ãFÄãFäF2äF\äFœäFÌäFæäFåF<åFtåF†åFÞåFæF*æFBæF€æF´æFÆæFçF4çFTçFÙ59c6be96e-38f7-11f0-2d30-a71f02755abc/536a0a4e512619da„¦layoutŠ¨autosizeÂ§paddingÊ¥xaxis‹¨showlineÃ©gridcolor¥black¨tickvalsÇ(€?@@@€@ @À@à@AA A¥range×€?0A¨ticktextÇ(€?€?€?@@€?©linecolor¥black¨showgridÃ¨gridwithÊ?€¨zerolineÃ¥title«Wind Values¦mirrorÃpaper_bgcolor°rgba(0, 0, 0, 0)¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦heightÊCR¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥yaxis‰¨showgridÃ¨showlineÃ©gridcolor¥black©gridwidthÊ?€¨tickvalsÇ€?@@@€@ @À@à@¥range×€?A¦mirrorÃ¨ticktext—       ©linecolor¥black¥titleƒ¤font¤sizeÊA`¤text¿Random policy
path example¡xÊ?¥widthÊC–¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤dataÜf‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖ@¤type§scatter¤text‘¡S¡xÖÀ?‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖ@¤type§scatter¤text‘¡G¡xÖA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@`@¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x×À? @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x× @À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@ @¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À? @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x× @À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À? @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x× @À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À? @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x× @À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @`@¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@@¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@@¤type§scatter¤name¬Optimal Path¡x×À? @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@°@¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×°@@¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@@¤type§scatter¤name¬Optimal Path¡x× @À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@@¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@°@¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×°@@¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@°@¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×°@°@¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×°@@¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@`@¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x×À? @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x× @À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@ @¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @`@¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x×À? @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@ @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x× @`@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x×`@ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @`@¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x× @À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@ @¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @`@¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x×À? @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@@¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@`@¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@ @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x× @`@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x×`@`@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×`@ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x× @`@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @`@¤type§scatter¤name¬Optimal Path¡x×`@`@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x×`@ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@ @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x× @À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @`@¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@@¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@@¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@`@¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@ @¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À? @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x× @À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x×À? @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @`@¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x× @`@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x×`@@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@@¤type§scatter¤name¬Optimal Path¡x×@`@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@@¤type§scatter¤name¬Optimal Path¡x×`@@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@@¤type§scatter¤name¬Optimal Path¡x×@@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@°@¤type§scatter¤name¬Optimal Path¡x×@°@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×°@ð@¤type§scatter¤name¬Optimal Path¡x×°@°@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×ð@ð@¤type§scatter¤name¬Optimal Path¡x×°@@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×ð@ð@¤type§scatter¤name¬Optimal Path¡x×@°@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×ð@ð@¤type§scatter¤name¬Optimal Path¡x×°@°@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×ð@ð@¤type§scatter¤name¬Optimal Path¡x×°@Ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×ð@ð@¤type§scatter¤name¬Optimal Path¡x×Ð@Ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×ð@ð@¤type§scatter¤name¬Optimal Path¡x×Ð@Ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×ð@ð@¤type§scatter¤name¬Optimal Path¡x×Ð@ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×ð@ð@¤type§scatter¤name¬Optimal Path¡x×ð@A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×ð@ð@¤type§scatter¤name¬Optimal Path¡x×Að@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×ð@ð@¤type§scatter¤name¬Optimal Path¡x×ð@A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×ð@ð@¤type§scatter¤name¬Optimal Path¡x×AAÙ59c6be96e-38f7-11f0-2d30-a71f02755abc/4e7985c38cb01320„¦layout„¥xaxis¥title¤textªTime steps¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥yaxis‚¤type£log¥title¤text±Steps Per Episode¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤data‘ƒ¡yÈ@@?D€ÅC€ÁC-CtBKC®BøB8BBìB%CþBøAC?C@CACBCCCDCECFCGCHCICJCKCLCMCNCOCPCQCRCSCTCUCVCWCXCYCZC[C\C]C^C_C`CaCbCcCdCeCfCgChCiCjCkClCmCnCoCpCqCrCsCtCuCvCwCxCyCzC{C|C}C~CC€C€€CC€C‚C€‚CƒC€ƒC„C€„C…C€…C†C€†C‡C€‡CˆC€ˆC‰C€‰CŠC€ŠC‹C€‹CŒC€ŒCC€CŽC€ŽCC€CC€C‘C€‘C’C€’C“C€“C”C€”C•C€•C–C€–C—C€—C˜C€˜C™C€™CšC€šC›C€›CœC€œCC€CžC€žCŸC€ŸC C€ C¡C€¡C¢C€¢C£C€£C¤C€¤C¥C€¥C¦C€¦C§C€§C¨C€¨C©C€©CªC€ªC«C€«C¬C€¬CC€C®C€®C¯C€¯C°C€°C±C€±C²C€²C³C€³C´C€´CµC€µC¶C€¶C·C€·C¸C€¸C¹C€¹CºC€ºC»C€»C¼C€¼C½C€½C¾C€¾C¿C€¿CÀC€ÀCÁC€ÁCÂC€ÂCÃC€ÃCÄC€ÄCÅC€ÅCÆC€ÆCÇC€ÇCÈCÙ59c6be96e-38f7-11f0-2d30-a71f02755abc/21f195b5663a5875„¦layout…¥xaxis¥title¤text¾Number of Samples Per Variable¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥yaxis¥title¤text¸Estimate of Maximum Mean¥titleÙ2Maximization Bias for IID Variables with Zero Mean¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤data”†¤mode¥lines¤line‚¥color¥black¤dash¤dash¡yÈ¤type§scatter¤nameªTrue Value¡xÈ€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A¨A°A¸AÀAÈAÐAØAàAèAðAøABBBBBBBB B$B(B,B0B4B8BM§>Î ‘>E¸ƒ>m>ÂÆZ>“QN>¡#C>À37>Ò£.>º®&>T>ò¬>;×>¨>M#>O>k–>™>ß~ý=èö=ð=H”ë=æ= ýà=f,Ý= sØ=ž=Õ=@¯Ñ=•3Î=Û‘Ê=RQÇ=¿tÄ=ÖcÂ=³¿=óÊ»=ã¢¸=´ö¶=%w´=XK²=j˜°=.=®=šØ¬=\|«=gª=âº§=]¦=„¤=‘œ¢=œ1¡=¡)Ÿ=k§=tø›=›=XŠ™=ñæ—=àÙ–=S›•=¥”=åh“=úÓ‘=k[=ÅóŽ=Äk=:‡Œ=·I‹=Š=9dˆ= À‡= +‡=ö—†=õº…=§h„=[ƒ=7¿‚=T½=âÀ€=L±=[T~=Ót|=œN{=úëy=Ârx=}‡w=XÁu=ús=÷ær=¨q=jÉo=1Èm=¸êk==!j=NŸh=bg=é(f=› e=üÊc=b.c=Lça=¤type§scatter¤name«2 variables¡xÈ€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A¨A°A¸AÀAÈAÐAØAàAèAðAøABBBBBBBB B$B(B,B0B4B8B½9Ù>WJÄ>´±>]·¤>ûûš> Ï’>TŠ>Iƒ>TÁz>JÙo>‹¶g>‘€_>Y>ê¯S>¹M>S¥G>?jC>êˆ>>'Œ9>!²5>x 2>Ø¹->˜*>‡ù&>,Š#>‹Ù >[×>§>ß>Å>Çø>]Ù>”·>ì >>>qÜ >-* >*€>ét> ì>ïw>94>’>{€þ=ëâû=}ù=§Iö=?ô=àÞñ= ð=ý¿í=2 ì=f,ê=‰8ç=Ô4å==Fã=Y”á=ÌÒß=IkÝ=šÛ=¼óÙ=uø×=POÖ=9®Ô=X¨Ò=«ÈÐ=(¥Ï=\‹Î=E¦Í=)Ì=ÓCÊ=~ãÈ=;ÅÇ=F Æ=¶”Ä=ÝœÃ=ÂpÂ=]Á=—‹À=J‰¿=ˆ@¾=‹V½=tÙ»=”º=¼v¹=»Ÿ¸=)#·=$âµ=Ëª´=@l³=ÅC²=ô°=e°=4A¯=áj®=†¯=®ô¬=¤type§scatter¤name«3 variables¡xÈ€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A¨A°A¸AÀAÈAÐAØAàAèAðAøABBBBBBBB B$B(B,B0B4B8B’o×>gwÇ>-òº>¶°>[ï¦>÷§ž>´³—>EN‘>ƒVŒ>J?‡>šPƒ>_î~>Àvw>x™p>ï=k>I¿d>54_>W Z>µÔU>.ÕP>„L>UI>eJE>(¿A>½a>>ˆ;>–8>¶6>bÙ2> 0>íä->ãÄ+>–)>QÊ'>Þ%>â#>€">µˆ >;è>ÚA>›Ð>>Ø<>;]>µÛ>)‚>N>7×>Šv>Œ.>8 >[m>Mf >È7 >å >û>ÌØ>´>D²>—½>ÄŸ> â>ö/ÿ=gîü= ?û=¡±ù=:ø=$7ö=í"ô=…>ò=AÏð=[,ï=n¤í=®\ì=@ïê=né=¿Dè=Þêæ=»å=²ä=ãmâ=Ùá=Èªß=‘~Þ=ìÛÜ=N€Û=Ü'Ú=[õØ='„×=ÐåÕ=PþÔ=ÂÔ=£üÒ=*Ò=¿"Ñ=¤type§scatter¤name«4 variables¡xÈ€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A¨A°A¸AÀAÈAÐAØAàAèAðAøABBBBBBBB B$B(B,B0B4B8BcŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦heightÊCH¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥yaxis‰¨showgridÃ¨showlineÃ©gridcolor¥black©gridwidthÊ?€¨tickvalsÇ€?@@@¥range×€?€@¦mirrorÃ¨ticktext“   ©linecolor¥black¥titleƒ¤font¤sizeÊA`¤textÙ Optimal policy
path example¡xÊ?¥widthÊCH¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤data–‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖÀ?¤type§scatter¤text‘¡S¡xÖÀ?‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖ`@¤type§scatter¤text‘¡G¡xÖ`@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @`@¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x×À? @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x× @`@Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/7c2857752627f863„¦layoutŠ¨autosizeÂ§paddingÊ¥xaxis‹¨showlineÃ©gridcolor¥black¨tickvalsÇ(€?@@@€@ @À@à@AA A¥range×€?0A¨ticktextÇ(€?€?€?@@€?©linecolor¥black¨showgridÃ¨gridwithÊ?€¨zerolineÃ¥title«Wind Values¦mirrorÃpaper_bgcolor°rgba(0, 0, 0, 0)¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦heightÊCR¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥yaxis‰¨showgridÃ¨showlineÃ©gridcolor¥black©gridwidthÊ?€¨tickvalsÇ€?@@@€@ @À@à@¥range×€?A¦mirrorÃ¨ticktext—       ©linecolor¥black¥titleƒ¤font¤sizeÊA`¤text¾Sarsa policy
Path Example¡xÊ?¥widthÊC–¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤dataš‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖ@¤type§scatter¤text‘¡S¡xÖÀ?‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖ@¤type§scatter¤text‘¡G¡xÖA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@`@¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@ @¤type§scatter¤name¬Optimal Path¡x×À? @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @`@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×`@@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x×@°@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x×°@Ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x×Ð@ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @@¤type§scatter¤name¬Optimal Path¡x×ð@AÙ59c6be96e-38f7-11f0-2d30-a71f02755abc/6aa5ac91f9de9235„¦layoutŠ¨autosizeÂ§paddingÊ¥xaxis‹¨showlineÃ©gridcolor¥black¨tickvalsÇ0€?@@@€@ @À@à@AA A0A@A¥range×€?PA¨ticktextœ            ©linecolor¥black¨showgridÃ¨gridwithÊ?€¨zerolineÃ¥title ¦mirrorÃpaper_bgcolor°rgba(0, 0, 0, 0)¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦heightÊCH¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥yaxis‰¨showgridÃ¨showlineÃ©gridcolor¥black©gridwidthÊ?€¨tickvalsØ€?@@@€@¥range×€? @¦mirrorÃ¨ticktext”    ©linecolor¥black¥titleƒ¤font¤sizeÊA`¤text½Cliff Walking Q Learning Path¡xÊ?¥widthÊC´¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤dataÜ‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖÀ?¤type§scatter¤text‘¡S¡xÖÀ?‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖÀ?¤type§scatter¤text‘¡G¡xÖHA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x×À? @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x× @`@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x×`@@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x×@°@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @`@¤type§scatter¤name¬Optimal Path¡x×°@°@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@ @¤type§scatter¤name¬Optimal Path¡x×°@°@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x×°@Ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x×Ð@ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x×ð@A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x×AA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x×A(A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x×(A8A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x×8AHA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x×HAHAÙ59c6be96e-38f7-11f0-2d30-a71f02755abc/afbc8d42c8c4fc44„¦layoutŠ¨autosizeÂ§paddingÊ¥xaxis‹¨showlineÃ©gridcolor¥black¨tickvalsÇ(€?@@@€@ @À@à@AA A¥range×€?0A¨ticktextÇ(€?€?€?@@€?©linecolor¥black¨showgridÃ¨gridwithÊ?€¨zerolineÃ¥title«Wind Values¦mirrorÃpaper_bgcolor°rgba(0, 0, 0, 0)¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦heightÊCR¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥yaxis‰¨showgridÃ¨showlineÃ©gridcolor¥black©gridwidthÊ?€¨tickvalsÇ€?@@@€@ @À@à@¥range×€?A¦mirrorÃ¨ticktext—       ©linecolor¥black¥titleƒ¤font¤sizeÊA`¤textÙ Optimal policy
path example¡xÊ?¥widthÊC–¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤data™‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖ@¤type§scatter¤text‘¡S¡xÖÀ?‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖ@¤type§scatter¤text‘¡G¡xÖA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@@¤type§scatter¤name¬Optimal Path¡x×À? @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@`@¤type§scatter¤name¬Optimal Path¡x× @`@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@ @¤type§scatter¤name¬Optimal Path¡x×`@@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x×@°@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x×°@Ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x×Ð@ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @@¤type§scatter¤name¬Optimal Path¡x×ð@AÙ59c6be96e-38f7-11f0-2d30-a71f02755abc/ebe8d19277071b89„¦layout‡¦xaxis1ˆ§yanchor¦bottom¨tickvals× A¥title„¤font¤sizeÊA ¨standoffÊ?€¤text¹# Cars at second locationªautomarginÃ¦domain×ffæ>©linewidthÊ@¦mirrorÃ¦anchor¢y1©linecolor¥white¦yaxis1‡¨tickvals× A¥title„£pad¡lÊ¨standoffÊ?€¤text¸# Cars at first locationªautomarginÃ¦domain×€?©linewidthÊ@¦mirrorÃ¦anchor¢x1©linecolor¥white¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦xaxis2ˆ§yanchor¦bottom¨tickvals× A¥title„¤font¤sizeÊA ¨standoffÊ?€¤text¹# Cars at second locationªautomarginÃ¦domain×ÍÌ?€?©linewidthÊ@¦mirrorÃ¦anchor¢y2©linecolor¥white¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¦yaxis2‡¨tickvals× A¥title„£pad¡lÊ¨standoffÊ?€¤text¸# Cars at first locationªautomarginÃ¦domain×€?©linewidthÊ@¦mirrorÃ¦anchor¢x2©linecolor¥white«annotations’‰§yanchor¦bottom§xanchor¦center¡yÊ?€¤font¤sizeÊA€©showarrowÂ¤yref¥paper¤textª$\pi_{41}$¤xref¥paper¡xÊ>fff‰§yanchor¦bottom§xanchor¦center¡yÊ?€¤font¤sizeÊA€©showarrowÂ¤yref¥paper¤text®$v_{\pi_{41}}$¤xref¥paper¡xÊ?Fff¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤data’‰¨colorbar©thicknessÊ@¥xaxis¢x1¡yÇT€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A¤type§heatmapªcolorscale¤RdBu¥yaxis¢y1¡zÜÇT€?@@@@€@€@ @ @ @ @ @ @ @ @ @ @ÇT€?€?@@@@@€@€@€@€@ @ @ @ @ @ @ @ÇT€?@@@@@@@@@@€@€@€@€@€@€@ @ @ÇT€?€?@@@@@@@@@@@@@@@@€@€@€@ÇT€?€?€?€?@@@@@@@@@@@@@@ÇT€?€?€?€?€?€?@@@@@@ÇT€?€?€?€?@@@ÇT€¿€?@@ÇT€¿€¿€?€?@ÇTÀ€¿€?@ÇTÀ€¿€¿€?@ÇTÀÀ€¿€?€?ÇT@ÀÀ€¿€?ÇT@ÀÀ€¿€?ÇT@ÀÀ€¿€¿€?ÇT@ÀÀÀ€¿€?ÇT@À@ÀÀ€¿ÇT€À@ÀÀ€¿ÇT€À@ÀÀ€¿€¿ÇT€À@ÀÀÀ€¿ÇT€À@À@ÀÀ€¿©transposeÃ¡xÇT€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A‰¨colorbar©thicknessÊ@¥xaxis¢x2¡yÇT€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A¤type§heatmapªcolorscale§Bluered¥yaxis¢y2¡zÜÇT† ÒC˜×CÐMÜCô·àC:ÄäCýÀèCvìC.wðCž)ôC÷¹÷C-ûCÑeþCÞÂD÷;DÌ¡DgöD;DTpDÈ–De® D£¶ DÇTÿœ×CÜCPJáC\´åClÀéCø¬íCÒvñCH)õCÀ¹øCñüC±eÿCÑBD»½D(-DŽˆDeÒDD§6 DuR D¹_DË]DÇT–ŠÜCÀzáCÒ7æC[¡êC|¬îCfvòCæ(öCp¹ùC«ýCÀ2D¹ÂD=DD‚D[RD›ŽD„¼ D4Û DXëDíDÆß DÇT&XáCDHæCâëC*mïCïuóC}(÷C¹úCOþC²D˜BD—½D-DpˆDGÒDŒ Dx< D)[DOkDýl D»_D3FDÇTöåCæêC¾¡ïCÈôCqøC¦¸ûCèÿCq2DkÂDp=Då¬DMD'RD|Ž Da¼ DÛD6ëDåì D§ßD&ÆD±œDÇT€[êC6KïC·ôC¢høCŠgüC|D9²D5BDF½Dº,D$ˆDýÑD\ DA<DùZDk DÌlDŠ_DFDœDnâDÇTê†îCJvóCE/øCuŽüCCCDÊD“µD=D†¬DîDÏQ D4Ž D¼DÒÚDöê D§ìDhßD÷ÅD†œDYbDQDÇT±zòC´i÷C!üC >De6DkýDÖ›DVD…DÆÏ DDê;D§Z DÇjDlD<_DÙED^D<âD9šDÝCDÇTAiöCC)ûC¿ÞÿCÆD_DÙÐDûgD—ÝDœ8 DÍ}DV°DJÒ DõäDéDÍÞDÅD6œD"bDDÄÃDÊ^DÇTÐ(úCÞþC-¶DWßD'ÐD1ŒDR D}‰ DÏÚDj DÁ<D SDZDJRD)<D(DôáDì™DCD²ÞDXlDÇT¯ÝýCà5D³fD~DjzD1 D|º D©Dôg D¡˜DoµDÛÀD‰¼Dn©DÉ‡DFWD©DsÃD•^D;ìD›oDÇT©µDbæD( D¾&D§ DÁ DÜCDw¡ DBáD°DŸDÅDØ DõïD^ÃDÝ‡D@<DoÞDlDcïDgDÇT*fDâŒDq¦D™¬DJ‘ D.=Dœ¹ DøD½GDyfD-pD³gDÏND´&DÊïDÐ©DÝSDðëDBoDVçDÔSDÇTµD*&DH,DD DzDÓ¦ DáD,lDOœDÕ²D×³DB¢D€D[ND Dç½D^D·ìD3gDžÓDÚ4DÇTü¥D¬DøŸ DÛ‚DY^ DõþD–nDý¶DÃßDîDJçD-ÍD ¢DfgD‘D”ÄD¥[DOáDxSDŸ´Dœ DÇTÎ+D¯ DDÞD«DSFDƒ¯DñD·DDDÜèDzµDOrDÛD+¾D³LDÊD}4D^ŠD ÚDÇT†ŸDJ‚ DÕ]Dk+DÙèDR}DáßD®D[5D‚5D$DZõD2ºDþnD„DÊªDI1Dí¦D< DÀYD žDÇT D¡ÝD&« DhD9DG£D!ÿD.3DGDY@D#DòDƒ¯Dî\DìúDª‰DÖD_wDÔDÓDBUDÇT}]Dð* DMèDõ”D—.D¶DDe8DµEDo8D€DÖÜDŽ“D:D&ÑDYD‡ÑDŸ9DbDýÔD»DÇTØªDhD°DT®DÓ5D±DWÿD&D-D´D²ïDì±D|bDÔDÁ“DˆDò‡DYêDç;Dò{D«DÇTûç D{”D.D•µDÝ0DfD*ÕDÃõD÷DÞD·®D˜kDÊDÂ±DV=DÚ¹D)'D»„DÙÑDDC:D©transposeÃ¡xÇT€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A AÙ59c6be96e-38f7-11f0-2d30-a71f02755abc/f51c1fa00f167ddf„¦layoutŠ¨autosizeÂ§paddingÊ¥xaxis‹¨showlineÃ©gridcolor¥black¨tickvalsÇ0€?@@@€@ @À@à@AA A0A@A¥range×€?PA¨ticktextœ            ©linecolor¥black¨showgridÃ¨gridwithÊ?€¨zerolineÃ¥title ¦mirrorÃpaper_bgcolor°rgba(0, 0, 0, 0)¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦heightÊCH¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥yaxis‰¨showgridÃ¨showlineÃ©gridcolor¥black©gridwidthÊ?€¨tickvalsØ€?@@@€@¥range×€? @¦mirrorÃ¨ticktext”    ©linecolor¥black¥titleƒ¤font¤sizeÊA`¤text½Cliff Walking Q Learning Path¡xÊ?¥widthÊC´¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤dataŸ‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖÀ?¤type§scatter¤text‘¡S¡xÖÀ?‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖÀ?¤type§scatter¤text‘¡G¡xÖHA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x×À? @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x× @`@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x×`@@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x×@°@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x×°@Ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x×Ð@ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x×ð@A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x×AA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x×A(A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x×(A8A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x×8AHA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x×HAHAÙ49c6be96e-38f7-11f0-2d30-a71f02755abc/d3a9386ca62c618„¦layout…¥xaxis¥title¤text¨Episodes¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥titleÙ"Episode Length for Noisy Gridworld¥yaxis‚¤type£log¥title¤text±Steps per Episode¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤data•„¡yÇP,TpB=›ŠB¾ŸBç„Bô¬nB4^B†HBþÃABÖÅ;B‡§4Bc'B‰’%BÁè&B¬B¤BóNBY†BÒoBÜBåB¤type§scatter¤name¥Sarsa¡xÇP€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A„¡yÇPy˜fB ƒB4`ŠBœ¢~BÆœjB8§PBMTB9B*ú7B_G.B})BgU$B-á Bÿ2BõÛB.?B ×B¾pB,¥Bz‡B¤type§scatter¤name®Expected Sarsa¡xÇP€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A„¡yÇP…ë_B‡)BIB±! BV_B¡Bý‡BgÄþA/îýAMs÷AFûA?WõA#[õAœóA qòAîëîA€ôA’ËíA}.ëAøÓçA¤type§scatter¤nameµDouble Expected Sarsa¡xÇP€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A„¡yÇP„ÞeB×ÒŠB ÉˆB®XB‚¢lBÎ\Bo0JB’Ë@B´H;B h/B`¥-B½#'BøB(B“X#B¨†Bð…B'BffB$×BÖVB¤type§scatter¤nameªQ-learning¡xÇP€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A„¡yÇPÕ\Bk«)BABþÃBR'B?ÆBÅ~BóÎB‡çB*BýþAS…úA6+úAOžõAiðAðîAq¬ñA˜]éAþåïAøSèA¤type§scatter¤name±Double Q-learning¡xÇP€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A AÙ59c6be96e-38f7-11f0-2d30-a71f02755abc/c925358b3c3d408e„¦layoutŠ¨autosizeÂ§paddingÊ¥xaxis‹¨showlineÃ©gridcolor¥black¨tickvalsÇ(€?@@@€@ @À@à@AA A¥range×€?0A¨ticktextÇ(€?€?€?@@€?©linecolor¥black¨showgridÃ¨gridwithÊ?€¨zerolineÃ¥title«Wind Values¦mirrorÃpaper_bgcolor°rgba(0, 0, 0, 0)¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦heightÊCR¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥yaxis‰¨showgridÃ¨showlineÃ©gridcolor¥black©gridwidthÊ?€¨tickvalsÇ€?@@@€@ @À@à@¥range×€?A¦mirrorÃ¨ticktext—       ©linecolor¥black¥titleƒ¤font¤sizeÊA`¤textÙ(Value Iteration Policy
Path Example¡xÊ?¥widthÊC–¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤dataÜ‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖ@¤type§scatter¤text‘¡S¡xÖÀ?‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖ@¤type§scatter¤text‘¡G¡xÖA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@@¤type§scatter¤name¬Optimal Path¡x×À? @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@@¤type§scatter¤name¬Optimal Path¡x× @`@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@@¤type§scatter¤name¬Optimal Path¡x×`@@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@°@¤type§scatter¤name¬Optimal Path¡x×@°@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×°@Ð@¤type§scatter¤name¬Optimal Path¡x×°@Ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×Ð@ð@¤type§scatter¤name¬Optimal Path¡x×Ð@ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×ð@ð@¤type§scatter¤name¬Optimal Path¡x×ð@A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×ð@ð@¤type§scatter¤name¬Optimal Path¡x×AA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×ð@ð@¤type§scatter¤name¬Optimal Path¡x×A(A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×ð@Ð@¤type§scatter¤name¬Optimal Path¡x×(A(A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×Ð@°@¤type§scatter¤name¬Optimal Path¡x×(A(A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×°@@¤type§scatter¤name¬Optimal Path¡x×(A(A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@`@¤type§scatter¤name¬Optimal Path¡x×(A(A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x×(AA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@@¤type§scatter¤name¬Optimal Path¡x×AAÙ59c6be96e-38f7-11f0-2d30-a71f02755abc/5e790add5f7b1844„¦layoutŠ¨autosizeÂ§paddingÊ¥xaxis‹¨showlineÃ©gridcolor¥black¨tickvalsÇ(€?@@@€@ @À@à@AA A¥range×€?0A¨ticktextÇ(€?€?€?@@€?©linecolor¥black¨showgridÃ¨gridwithÊ?€¨zerolineÃ¥title«Wind Values¦mirrorÃpaper_bgcolor°rgba(0, 0, 0, 0)¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦heightÊCR¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥yaxis‰¨showgridÃ¨showlineÃ©gridcolor¥black©gridwidthÊ?€¨tickvalsÇ€?@@@€@ @À@à@¥range×€?A¦mirrorÃ¨ticktext—       ©linecolor¥black¥titleƒ¤font¤sizeÊA`¤textÙ(Value Iteration Policy
Path Example¡xÊ?¥widthÊC–¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤dataÜ‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖ@¤type§scatter¤text‘¡S¡xÖÀ?‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖ@¤type§scatter¤text‘¡G¡xÖA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@`@¤type§scatter¤name¬Optimal Path¡x×À? @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@ @¤type§scatter¤name¬Optimal Path¡x× @`@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x×`@@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×@°@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×°@Ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×Ð@ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?`@¤type§scatter¤name¬Optimal Path¡x×ð@A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@°@¤type§scatter¤name¬Optimal Path¡x×AA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×°@ð@¤type§scatter¤name¬Optimal Path¡x×AA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×ð@ð@¤type§scatter¤name¬Optimal Path¡x×A(A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×ð@Ð@¤type§scatter¤name¬Optimal Path¡x×(A(A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×Ð@°@¤type§scatter¤name¬Optimal Path¡x×(A(A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×°@@¤type§scatter¤name¬Optimal Path¡x×(A(A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@`@¤type§scatter¤name¬Optimal Path¡x×(AA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@ @¤type§scatter¤name¬Optimal Path¡x×AA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x×AA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @`@¤type§scatter¤name¬Optimal Path¡x×AA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@@¤type§scatter¤name¬Optimal Path¡x×AAÙ59c6be96e-38f7-11f0-2d30-a71f02755abc/ada388116d66970b„¦layoutŠ¨autosizeÂ§paddingÊ¥xaxis‹¨showlineÃ©gridcolor¥black¨tickvalsÇ0€?@@@€@ @À@à@AA A0A@A¥range×€?PA¨ticktextœ            ©linecolor¥black¨showgridÃ¨gridwithÊ?€¨zerolineÃ¥title ¦mirrorÃpaper_bgcolor°rgba(0, 0, 0, 0)¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦heightÊCH¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥yaxis‰¨showgridÃ¨showlineÃ©gridcolor¥black©gridwidthÊ?€¨tickvalsØ€?@@@€@¥range×€? @¦mirrorÃ¨ticktext”    ©linecolor¥black¥titleƒ¤font¤sizeÊA`¤textÙ!Cliff Walking Expected Sarsa Path¡xÊ?¥widthÊC´¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤dataÜ‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖÀ?¤type§scatter¤text‘¡S¡xÖÀ?‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖÀ?¤type§scatter¤text‘¡G¡xÖHA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @`@¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x×À? @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x× @`@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x×`@@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x×@°@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x×°@Ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x×Ð@ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x×ð@A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x×AA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x×A(A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x×(A8A‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@`@¤type§scatter¤name¬Optimal Path¡x×8AHA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@ @¤type§scatter¤name¬Optimal Path¡x×HAHA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x×HAHAÙ59c6be96e-38f7-11f0-2d30-a71f02755abc/1cb9d5b796f6ec98„¦layout‡¦xaxis1ƒ¥title¤text¥State¦domain×ffæ>¦anchor¢y1¦yaxis1‚¦domain×€?¦anchor¢x1¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦xaxis2ƒ¥title¤text°Walks / Episodes¦domain×ÍÌ?€?¦anchor¢y2¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¦yaxis2‚¦domain×€?¦anchor¢x2«annotations’‰§yanchor¦bottom§xanchor¦center¡yÊ?€¤font¤sizeÊA€©showarrowÂ¤yref¥paper¤textÙ-Estimated Value with TD(0)
with Î± = 0.2¤xref¥paper¡xÊ>fff‰§yanchor¦bottom§xanchor¦center¡yÊ?€¤font¤sizeÊA€©showarrowÂ¤yref¥paper¤textÙ)Empirical RMS error, averaged over states¤xref¥paper¡xÊ?Fff¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤data—‡¤line¥color¥black¥xaxis¢x1¡yÇ«ª*>«ªª>?«ª*?UUU?¤type§scatter¤name«True values¥yaxis¢y1¡x•¡A¡B¡C¡D¡E†¥xaxis¢x1¡yÇ?????¤type§scatter¤nameª0 episodes¥yaxis¢y1¡x•¡A¡B¡C¡D¡E†¥xaxis¢x1¡yÇÍÌÌ>????¤type§scatter¤nameª1 episodes¥yaxis¢y1¡x•¡A¡B¡C¡D¡E†¥xaxis¢x1¡yÇÍ¤B>†”Ä>HZç>7}?Î'?¤type§scatter¤nameª7 episodes¥yaxis¢y1¡x•¡A¡B¡C¡D¡E†¥xaxis¢x1¡yÇíR>ö&¬>L"ä>¼¼?ÇØF?¤type§scatter¤name«15 episodes¥yaxis¢y1¡x•¡A¡B¡C¡D¡E†¥xaxis¢x1¡yÇþ¬³=Í¢¢>ãÜî>Ãx ?ÿ=?¤type§scatter¤name«99 episodes¥yaxis¢y1¡x•¡A¡B¡C¡D¡E‡ªshowlegendÂ¥xaxis¢x2¡yÈ”ï[q>yVW>-?>/*>&é>ÉW>èpì=nÕÓ=—œÂ=eù«=³{=ÿE=‹»‡=ÃD„=£†ƒ=+#ˆ=Ðƒ=5¢‰=Ãu‹=Û9‹=O•=M•=æ×™=6¦›=®Xž=#Û=-Rœ=/œ=ˆlž=ó7 =E9¥=~¡=È´¢=8¨=3ã¦=éÂª=ZÔ¨=O:¤=î¤=ât¨=">§=â0®=r‡¯=»Û°=¸Š=‰_«="º±=x;«=p”ª=»¢¨=Œ¦=KÇª=^ ¯=§*±=Å³='´=ÄR¯=)E§=“¨=$b©=Õƒ¦=U’¢=Î¾ž=ý¢=úWª=Ö§=ß-¥=3¥=Ó…¤= *©=È’«=û¬=Õ«=à¨=t¸ª=!°=`~±=ë|±=Ô«=+y¥=]¨=ïE¤=i# =Ðš =[¥=SB§=.ý©=ï‰®=#J«=¤”=h=¦=œy¦=º,¢=¨.¤=5n§=Fó©=hv¯=¸¯=ë<°=Ô®=a_²=¤type§scatter¤name©RMS error¥yaxis¢y2¡xÈ€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A¨A°A¸AÀAÈAÐAØAàAèAðAøABBBBBBBB B$B(B,B0B4B8BcŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥yaxis¥title¤textµ% left actions from A¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤data–„¡yÈ°“:?À[?1?¾Á?¼t#??Æ,?áz4?KY6?‡9?Ú#Ûù>‡ù>˜Ýó>*©ó> cî>rŠî>çì>KÈç>Ðä>æ?ä>Sã>‘Þ>ížÜ>QÚ>¢´×>=›Õ>Ô>|òÐ>;Í>U0Ê>zÇÉ>ÔšÆ>A‚Â>@Á>và¼>mV½>þCº>õJ¹>a2µ>µ>O@³>˜n²>„ž>òÒ>_˜¬>–²¬>ð¨>]m¥>M¤>fˆ£>ÊT¡>¤p>d;Ÿ>Ù>?W›>þCš>š›>¬˜>O–>†É”>ý‡”>é&‘>)í>¾>ÙŽ>z6‹>q=Š>°rˆ>B>ˆ>M„>9Ö…>ïÉƒ>.€>ÊT>J‚>€>#Ûy>ÇKw>F¶s>"lx>ý‡t>ÄBm>h³j>yXh>Á¨d>fff>Ttd> ×c>›U_>Ò^>?W[>¾ŸZ>_X>cY>OV>PüX>¼tS>aTR>£R>NÑQ>…ëQ>•ÔI>'1H>xK>:’K>jM>¹F>&SE>ÞB>ÜFC>¥½A>E>oD>@>¦›D>€H?>£’:>Zd;>µ;>X9>5>ìQ8>³1>S4>-2>EG2>¡g3> A1>Åþ2>ê•2>O/>²..>h",>é·/>zÇ)>Õx)>0*)>f÷$>œ¢#>8'>Ó¼#>ÓM">›U>|!> >¬> Š>›æ>?5> >và>QÚ>vO>åò>i>¾Á>ã6>Î>ð>š™>«>>¾0>¾0>ôl>™»>s×>s×>óŽ>Ï÷>H>ò°>¼>ó>+>™»>*©>½R>Í;>ª‚>—ÿ>à-><½>*:>ßO >ßO >4>M„ >Ù>–C>ºI>–²>ƒ/>–²>^º >Ìî >ÍÌ>L7 >¨W >§y> >Kê>a>KÈ>+>¼>&S>]þ>ïÉ>p_>L7 >”ö>o>9Ö>o>¥N>¥½>•>¥N>ò>%>µû=o>‘ú=þCú=·bÿ=mÅþ=$—ÿ=ÿ!ý=Žuñ=¤type§scatter¤name¥Sarsa¡xÈ°€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A¨A°A¸AÀAÈAÐAØAàAèAðAøABBBBBBBB B$B(B,B0B4B8BC?C@CACBCCCDCECFCGCHCICJCKCLCMCNCOCPCQCRCSCTCUCVCWCXCYCZC[C\C]C^C_C`CaCbCcCdCeCfCgChCiCjCkClCmCnCoCpCqCrCsCtCuCvCwCxCyCzC{C|C}C~CC€C€€CC€C‚C€‚CƒC€ƒC„C€„C…C€…C†C€†C‡C€‡CˆC€ˆC‰C€‰CŠC€ŠC‹C€‹CŒC€ŒCC€CŽC€ŽCC€CC€C‘C€‘C’C€’C“C€“C”C€”C•C€•C–C„¡yÈ°Nb?Þ?à-?mV?òA/?[Ó6ü>¾Ÿú>}®ö>&ó>{ƒï>Œ¹ë>ŒÛè>TRç>8øâ>ÊTá>6<Ý>d]Ü>}®Ö>ØðÔ>O@Ó>HÐ>úíË>q=Ê>]þÃ> ×Ã>$—¿>’\¾>dÌ½>Qkº>™»¶>¡Ö´>¶> A±>ò°°>»¸>±P«>KÈ§>Ë§>¸¯£>Ü×¡>ƒ >’\ž>vq›>~˜>•”>Ï÷“>é·> cŽ>2æŽ>±¿Œ> Š>ƒQ‰>ç‡>ÊT>œ¢ƒ>’\~>È}>ì/{>¡gs>"ýv>iop>jÞq>û:p>z¥l>“i>§h>øÂd>÷u`>và\>QkZ>™»V>u“X>a2U>jM>;ßO>¨WJ>ÌîI>Ì]K>ïÉC>JB>\ A>Ç:>È=>Zd;>È˜;>l 9>Y7>Ù_6>´Y5>×£0>Åþ2>XÊ2>i/>èÙ,>C+>h³*>O/>§(>Ÿ<,> q,>Â&>$(>øÂ$>ÓM">ÁÊ!>R' >@¤>dÌ> ù >®Ø>d;>c>ö(>ú>Ñ‘>âX>«>>O¯>àœ>âX>¼t>*:>˜Ý>)Ë>·>òÒ >)\>ƒ/>*:>á>; >„ >M„ >ƒ/>ñô >(>Mó>M„ >pÎ>ð… >KY>”ö>9E>&S>?>p_>Þ >Kê>Ý$>¹ü>Ý$>Ë>?>J{>¥,>mÅþ=‚s>J{>•>Ü×>ð>¼>L7 >¥½>¶óý=“:>]m>§y>ƒ>‘þ=o>þCú=$—ÿ=ÿ²û="lø=´Yõ=$¹ü= ø=þe÷=#Jû=µ¦ù=Ù_ö=Øó=Žð="ýö=Grù=jMó=F”ö=k+ö=þÔø=×£ð=ÙÎ÷=´Èö=².î=!°ò=Dúí=üó=Dúí=kš÷= cî=×ò=Ù=ù=×£ð=þe÷=kš÷=².î=ioð=h³ê=ê=¤type§scatter¤nameªQ-learning¡xÈ°€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A¨A°A¸AÀAÈAÐAØAàAèAðAøABBBBBBBB B$B(B,B0B4B8BC?C@CACBCCCDCECFCGCHCICJCKCLCMCNCOCPCQCRCSCTCUCVCWCXCYCZC[C\C]C^C_C`CaCbCcCdCeCfCgChCiCjCkClCmCnCoCpCqCrCsCtCuCvCwCxCyCzC{C|C}C~CC€C€€CC€C‚C€‚CƒC€ƒC„C€„C…C€…C†C€†C‡C€‡CˆC€ˆC‰C€‰CŠC€ŠC‹C€‹CŒC€ŒCC€CŽC€ŽCC€CC€C‘C€‘C’C€’C“C€“C”C€”C•C€•C–C„¡yÈ°7?¥ý>Ž?Îù>úíë>¿}Ý>–²Ì>î|¿>-²>‚â§>¤p>F%•>—Ž>ðˆ>·b>"lx>Åq> ql>øÂd>ˆô[>™»V>áS>aTR>ûK>M„M>®G>?F>I.?>ÜFC>MD>ÜFC>€·@>£’:>"Ž5>}®6>5>Øð4>5>{.>±¿,>h",>Õç*>ff&>ž^)>Õx)>@!>æ?$>Á9#>Á9#>fˆ#>.ÿ!>Òo> >ž>‡>š>ã¥>=›>= >>&>¼>tF>Î>¼–>˜n>û>Îª>*:>ñc>ñô >V>:# >Ý$>ºk >ï8>¹>ð§>¹ü>KÈ>oð>“©>¸@>F%õ=ÿ²û=mçû=ò>þe÷=€·>"ýö=Ûùþ=l ù=äò=ý‡ô=1÷=F¶ó=$¹ü=þÔø=Diï=‘~û=þÔø=ü©ñ=³ñ=×4ï=úíë=ŒJê=úíë=û\í=kš÷=ŒÛè= Òï=±¿ì=D‹ì=Õxé=gDé=B>è=°ç=…ë=°ç=ÓMâ=÷ß=÷uà=ˆ…Ú=Ô+å=ÀÛ=QÚ= à=‹ýå=|á=‰ÒÞ=§è=‹ýå=ù1æ=eâ=õJÙ=Éå=>èÙ=ÐDØ==›Õ=bØ==›Õ=<½Ò=_Î=QÚ=a2Õ=iÞ=ò°Ð=¨5Í=ÎˆÒ=¬‹Û=bØ=žÞ=ªñÒ=HÐ=„žÍ=·Ñ=ÐÕÖ=`vÏ=HÐ=ÙÎ=?ÆÜ=„ Ï=jÍ=òAÏ=…|Ð=_)Ë=ƒ/Ì=¨WÊ=¨ÆË=ƒQÉ=_˜Ì=ñcÌ=ð…É=ÙÎ=;ßÏ=7À=ÐÄ=Zd»=Í;Î=^KÈ=[±¿=®Ç=5^º=^ºÉ=ð§Æ=òAÏ=7‰Á=7‰Á=\Â=Ì]Ë=6¼=6¼=[±¿=òÁ=•Ã=Zõ¹=ìQ¸=ì/»=4¢´=6¼=ësµ=6Í»=j¼=ìQ¸=é·=Å±=Zd»=¤ß¾=¥NÀ=ì/»=Zd»=5ï¸=-²=]þÃ=\Â=¥NÀ=í ¾=}®¶=6¼=X¹=6<½=Év¾=ÇK·=¤p½=[B¾=[Ó¼=33³=é&±=ÆÜµ=4¶=ê´=Å °=Év¾=6<½=5^º=VŸ«=Å±=È˜»=~¸=é·=yX¨=¡ø±=X¨µ=¾°=ÆÜµ=é·=îëÀ=Æm´=Y†¸=z¶=V}®=é·¯=Y·=Ãdª=ê•²=Y·=ìQ¸=ÇK·=ê´=ê•²=}®¶=ûº= q¬=2w=}?µ=O¯=}?µ=¢´·=Å °=5^º=Wì¯=¡g³=œ³=5^º=£’º=V=£’º=XÊ²=¤p½=X¹=6¼=yé¦=}Ð³=¥½=é&±=j¼=XÊ²=5^º=µ=V}®=Ä±®=ÇK·=}?µ=~Œ¹=œ³=¡Ö´= à=-²=µ=¤type§scatter¤name±Double Q-learning¡xÈ°€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A¨A°A¸AÀAÈAÐAØAàAèAðAøABBBBBBBB B$B(B,B0B4B8BC?C@CACBCCCDCECFCGCHCICJCKCLCMCNCOCPCQCRCSCTCUCVCWCXCYCZC[C\C]C^C_C`CaCbCcCdCeCfCgChCiCjCkClCmCnCoCpCqCrCsCtCuCvCwCxCyCzC{C|C}C~CC€C€€CC€C‚C€‚CƒC€ƒC„C€„C…C€…C†C€†C‡C€‡CˆC€ˆC‰C€‰CŠC€ŠC‹C€‹CŒC€ŒCC€CŽC€ŽCC€CC€C‘C€‘C’C€’C“C€“C”C€”C•C€•C–C„¡yÈ°³{?6«þ>ÐDø>F%õ>Dúí>–²ì>°rè>(ë>–²ì>çûé> ï>à-ð>s×ò>F%õ>ÇK÷>šû>?Æü>RIý>.?¥½?ê??+‡?†8?‚â?ŒÛ?zÇ ?Ÿ<?².?V?Wì?îë?W[?øS?Ó¼?=,?O¯??¼?¯”?Â?°?+?Y?0*?¹ü?“?l ?•Ô??£#?ž^?,e?c?£#?§?#Û?£#?Õx?é?Õ ?G?‹ý?†8?Ý$?¡Ö?Ø?o?ªñ?Îˆ?åa?œÄ?‰A? ?·b? à ?j ?_˜?h³ ?ƒÀ ?Y†?µ7?ç?Â?=›?f÷?8g?o?¡g?Žu?w-?Üh?¥ý>Çú>¬ø>"lø>½Rö>†Éô>{ƒï>„ ï>Mì>0*é>çûé>§è>]må>ÜFã>oá>|á>äÝ>£’Ú> Ø>"ýÖ>ësÕ>aTÒ>à¾Î>ÖVÌ>;Í>ûË>¨ÆË>:’Ë>B`Å>øÂÄ>åÐÂ>vO¾>d;¿>6Í»>šw¼>Háº>Y·>½R¶>¼t³>Å±>)\¯>±P«>¨Æ«>š¨>žï§>/Ý¤>Ô+¥>xœ¢>I >ÒoŸ>¿œ>5^š>Ç)š>¾Á—> ˜>4–>•>”>EG’>jÞ‘>EØ>„ž>V>úí‹>zÇ‰>9´ˆ>9´ˆ>gÕ‡>Ë¡…>.ÿ>¸¯ƒ>ë‚>7‰>¥N€>‘~{>lxz>5ïx>µ{>"lx>4v>´Èv>W[q>z6k>33s> ql>èÙl>¯”e>UÁh>ù1f>Sc>.`>Ñ‘\>®Ø_>ö(\>t$W>™*X>+öW>s×R>Í;N>VN>ÎªO>rŠN>¨WJ>+G>ºIL>9´H>Ê2D>]ÜF>oD>]mE>Ü×A>¶ó=>£’:>mç;>6<=>Gr9>ÇK7>F”6>}®6>17>Øð4>¡g3>2U0>-2>io0>—.>S4>(->V}.>èj+>yX(>žï'>Ãd*>±á)>ç'>Â&>€&>“)>U0*>§(>æ?$>/n#>åa!>øS#>¬>S–!>®¶">?Æ>¿}>¾Ÿ>/>=>>Pü>RI>‡§>™»>™»>âX>àœ>†8>©>Ù>·>à¾>·>¨5 >ºk >Mó>ßà>`v>Þ >ñô >–C>^K>)\>; >_˜>§è>ð§>pÎ>Ë>Ê2>Ì>Þ“>‚s>”ö>ƒÀ >8g>Ýµ>Ê2>8ø>ò>îë>mçû=îZ>·Ñ>µû="ýö=$¹ü=Ûùþ=$—ÿ=‘ú=¶óý=´Èö=ÙÎ÷=Dúí=kš÷=¤type§scatter¤name®Expected Sarsa¡xÈ°€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A¨A°A¸AÀAÈAÐAØAàAèAðAøABBBBBBBB B$B(B,B0B4B8BC?C@CACBCCCDCECFCGCHCICJCKCLCMCNCOCPCQCRCSCTCUCVCWCXCYCZC[C\C]C^C_C`CaCbCcCdCeCfCgChCiCjCkClCmCnCoCpCqCrCsCtCuCvCwCxCyCzC{C|C}C~CC€C€€CC€C‚C€‚CƒC€ƒC„C€„C…C€…C†C€†C‡C€‡CˆC€ˆC‰C€‰CŠC€ŠC‹C€‹CŒC€ŒCC€CŽC€ŽCC€CC€C‘C€‘C’C€’C“C€“C”C€”C•C€•C–C„¡yÈ°®¶?vOþ>Ž?ˆôû>{ƒï>Š°á> AÑ>B`Å>5^º>ü©±>h"¬>Ë§>¸¯£>Ñ‘œ>d]œ>mç›>ës•>ÐÕ–>–>F¶“>EØ>Ÿ«>¨5>EØ>×4>D‹Œ>h"Œ>‹>z6‹>_)‹>ñôŠ>èj‹>ð…‰>ù‰>f÷„>pÎˆ>žï‡>ŒÛˆ>ï8…>(‹>oð…>]þƒ>°‡>]Ü†>Z„>ÜFƒ>Z„>A‚‚>&‚>îZ‚>9Ö…>\‚>Ýµ„>÷ä>F…>îë€>7€>¥N€>·b>ÛŠ}>"lx>Ú¬z>îë€>~Œy>£’z>@>³ês>j¼t>¶„|>Øðt>ý‡t>Æmt>EØp>ü©q>Âu>Å p>EGr>1w>u>¡gs>ÖÅm>³{r>z¥l>ûËn>ôl>zÇi>žïg>ÃÓk>gÕg>Ãdj>UÁh>ù1f>Ãdj>.ÿa>œÄ`>-²]>›æ]>ù1f>÷äa>ö—]>w-a>Òo_>šw\>>èY>QkZ>ðV>>èY>>èY>cîZ>ã6Z>½RV>ˆ…Z>ÐÕV>†ÉT>˜nR>¼R>xK>¼–P>shQ>ÍÌL>ºIL>¼tS>)íM>VN>©P>—ÿP>:’K>NbP>KêD>òÒM>ºIL>–²L>ƒÀJ>§yG>qM>ÝµD>%uB>ï8E>ÜFC>?>?F>’Ë?>‘~;>JB>î|?>¥½A>ËG>ƒ@>6<>[Ó<>aC>‘>>H¿=>6<=>kš7>Ù=>¥=>Ú¬:>£#9>H¿=>#J;>Ç:>Ç:>Øð4>!4>þe7>Y7>Zõ9>X¨5>ÆÜ5>é·/>Wì/>²..>{ƒ/>¾0>1,>h",>ü3>èj+>²..> q,>µ&>û\->ÃÓ+>±¿,>…+>Tt$>. >-!>ÁÊ!>0L&>R' >|!>Š°!>ŠŽ$>e">žï'>|!>¯”%>œ¢#>-²>QÚ>-C>/>?5>æ?$>i>÷ä!>ö—>ö—>¾Á>Q>ÐD>u>†É>â>d]>‡>¼t>s×>P>tF>«>>tµ>ÐÕ>t$>>ª`>+‡>u>†8>à->…|>Î>ð>ôý>>»'>û>àœ>©>–!>òÒ >_˜>)í >à->¼–>¨Æ>ïÉ>ºk >…|>„ >ºk >š>; > >Œ >ÍÌ>ƒ/>&S>Kê>pÎ>»¸ >:’>“>‚s>?>Ý$>Ë>Ë¡>®>¥,>§è>¦›>>ï8>ÊÃ>Þ>n£>Úü=J{>ïÉ>7>ò>¥,>KY>9Ö>a>¥N>ïÉ>Þ>\>‘~û=Háú=îë>J{>lxú="lø=ÜF>¤type§scatter¤nameµDouble Expected Sarsa¡xÈ°€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A¨A°A¸AÀAÈAÐAØAàAèAðAøABBBBBBBB B$B(B,B0B4B8BC?C@CACBCCCDCECFCGCHCICJCKCLCMCNCOCPCQCRCSCTCUCVCWCXCYCZC[C\C]C^C_C`CaCbCcCdCeCfCgChCiCjCkClCmCnCoCpCqCrCsCtCuCvCwCxCyCzC{C|C}C~CC€C€€CC€C‚C€‚CƒC€ƒC„C€„C…C€…C†C€†C‡C€‡CˆC€ˆC‰C€‰CŠC€ŠC‹C€‹CŒC€ŒCC€CŽC€ŽCC€CC€C‘C€‘C’C€’C“C€“C”C€”C•C€•C–C…¤line¤dash¤dash¡yÈ°ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=ÍÌL=¤type§scatter¤name§optimal¡xÈ°€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A¨A°A¸AÀAÈAÐAØAàAèAðAøABBBBBBBB B$B(B,B0B4B8BC?C@CACBCCCDCECFCGCHCICJCKCLCMCNCOCPCQCRCSCTCUCVCWCXCYCZC[C\C]C^C_C`CaCbCcCdCeCfCgChCiCjCkClCmCnCoCpCqCrCsCtCuCvCwCxCyCzC{C|C}C~CC€C€€CC€C‚C€‚CƒC€ƒC„C€„C…C€…C†C€†C‡C€‡CˆC€ˆC‰C€‰CŠC€ŠC‹C€‹CŒC€ŒCC€CŽC€ŽCC€CC€C‘C€‘C’C€’C“C€“C”C€”C•C€•C–CÙ59c6be96e-38f7-11f0-2d30-a71f02755abc/ff3e7516945b9e18„¦layoutŠ¨autosizeÂ§paddingÊ¥xaxis‹¨showlineÃ©gridcolor¥black¨tickvalsÇ€?@@@¥range×€?€@¨ticktextÇ(€?€?€?@@€?©linecolor¥black¨showgridÃ¨gridwithÊ?€¨zerolineÃ¥title«Wind Values¦mirrorÃpaper_bgcolor°rgba(0, 0, 0, 0)¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦heightÊCH¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥yaxis‰¨showgridÃ¨showlineÃ©gridcolor¥black©gridwidthÊ?€¨tickvalsÇ€?@@@¥range×€?€@¦mirrorÃ¨ticktext“   ©linecolor¥black¥titleƒ¤font¤sizeÊA`¤textÙ Optimal policy
path example¡xÊ?¥widthÊCH¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤dataÜf‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖÀ?¤type§scatter¤text‘¡S¡xÖÀ?‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖ`@¤type§scatter¤text‘¡G¡xÖ`@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À? @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x× @À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À? @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x× @ @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @ @Ù59c6be96e-38f7-11f0-2d30-a71f02755abc/bf44e09ac1fcc101„¦layout„¥xaxis¥title¤textªTime steps¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥yaxis¥title¤text¨Episodes¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤data‘„¤line¥color£red¡yÈ@€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A¨A°A¸AÀAÈAÐAØAàAèAðAøABBBBBBBB B$B(B,B0B4B8BC?C@CACBCCCDCECFCGCHCICJCKCLCMCNCOCPCQCRCSCTCUCVCWCXCYCZC[C\C]C^C_C`CaCbCcCdCeCfCgChCiCjCkClCmCnCoCpCqCrCsCtCuCvCwCxCyCzC{C|C}C~CC€C€€CC€C‚C€‚CƒC€ƒC„C€„C…C€…C†C€†C‡C€‡CˆC€ˆC‰C€‰CŠC€ŠC‹C€‹CŒC€ŒCC€CŽC€ŽCC€CC€C‘C€‘C’C€’C“C€“C”C€”C•C€•C–C€–C—C€—C˜C€˜C™C€™CšC€šC›C€›CœC€œCC€CžC€žCŸC€ŸC C€ C¡C€¡C¢C€¢C£C€£C¤C€¤C¥C€¥C¦C€¦C§C€§C¨C€¨C©C€©CªC€ªC«C€«C¬C€¬CC€C®C€®C¯C€¯C°C€°C±C€±C²C€²C³C€³C´C€´CµC€µC¶C€¶C·C€·C¸C€¸C¹C€¹CºC€ºC»C€»C¼C€¼C½C€½C¾C€¾C¿C€¿CÀC€ÀCÁC€ÁCÂC€ÂCÃC€ÃCÄC€ÄCÅC€ÅCÆC€ÆCÇC€ÇCÈC¤type§scatter¡xÈ@@?D‘D`ÁD×D ÞDøDpE0 EE0EEàEÐ'EÀ)E°,Eð.E6EP9E@E AE DEHEðHEJEÀLE`SE€VE XE@_E@aEbEiE mE@rEÐsE0uEP|E°~EÀEÀEx‚EX…E†E¨†EH‡E°ˆEh‹E¨ŒEPEÈEhE`E˜‘Eˆ’E“EH”EÈ•EX–Eh—EØ—E˜˜Eà˜Ex™E›Eh›EØ›EHœEðœE@EžEàžE`¢E˜¢E £Eˆ£E`¤E¥EH¥EP¦E¦E(§Ep§E¨¨E`©E¨©EPªE0¬E€¬E€E ®EP¯E ¯EØ¯E°E`°E¨°E0±Eˆ±Eè±E ²E`²EÀ³Eø³E8´EPµE¶Eð¶Ex·E¸Ex¸EØ¸E(¹Eà¹EÈºE ¿EÁE(ÂEÈÂE ÃE0ÄExÅE0ÆE˜ÆEÐÇE ÈE@ÉE€ÉEàÉE(ËEÌE€ÌEÍEXÎE¨ÎEÏE¨ÏEàÏE8ÐEpÑEÀÑEHÒEÐÒEPÓEÕEPÕEØÖE€ØEÈØEÀÙEÚE€ÚE¸ÚEðÚExÛEØÛE¨ÜEÝEpÝE@ÞE€ÞE¸ÞEßE`ßE˜ßEÐßEáEPáEˆáEÀáEøáE`âE(ãEãEèãE(äExäE˜åEàåEÀæE¨çEèEÐèEéEèéE ëE0ìEØìE¨íEîEÐïEðEhñE¨ñEàñE òEXòE˜òEÐòEóEHóE€óE¸óEøóE@ôEÈôE˜õEØõE(öEpöE@÷Ex÷EÀ÷EøE@øEØøEXùEùEØùE úE`úE˜úEÐúEûEHûEûE8üEhýEþEHþEˆþE8ÿE€ÿE(FPFxF”FèFF FcŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥yaxis¥title¤text¸Estimate of Maximum Mean¥titleÙ0Maximization Bias for 2 Variables with Zero Mean¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤data’„¡yÇÈÉ:L>Ë‰=¯œ=ûXË<øæ<¨_<³2<@ï<+Éó;ãgÏ;/õ²;ÄÔœ;1‹;Öy;Ë˜`;ßòK;õÞ8;³);·ç;Üÿ;©š;Šsý:Ïyì:JEÝ:&QÏ: Ä:n’¸:vÊ®:Å¹¥:R·œ:- •:Oö:©Ç‡:&‚:z:*£n:q¯e:zÄ\:¦ŒT:·#L:ƒE:0Ç>:ìÉ8:„2:° ,:3':"º!:«Š:ê':SÕ:¤type§scatter¤nameµMax of Means Estimate¡xÇÈ@€@À@A A@A`A€AA A°AÀAÐAàAðABBBB B(B0B8B@BHBPBXB`BhBpBxB€B„BˆBŒBB”B˜BœB B¤B¨B¬B°B´B¸B¼BÀBÄBÈB„¡yÇÈsp!»UÖº|gº´"º‰(º0¹F¹|í¸”¦O¸`7¹rì±¸@ñ¹/Yû¸¡A´¸ÖÔ¸åÍ¾¸+"ê¸A*ç¸Y>ã¸‚Û¸9%®¸à;¸+èo¸ð5`¸)*‡¸QB{¸F²˜¸M4o¸ÅS¸ŽZ¸$˜P¸Bì'¸Z¬#¸ê¡ü·•çó· Ó,¸{)5¸±'¸G£Ü·VÃ½·r6À·šXé·{´·}PÕ·»™ó·iÖ·=ñ®·¶yÃ·‹jÅ·Û½ ·¤type§scatter¤name³Double Max Estimate¡xÇÈ@€@À@A A@A`A€AA A°AÀAÐAàAðABBBB B(B0B8B@BHBPBXB`BhBpBxB€B„BˆBŒBB”B˜BœB B¤B¨B¬B°B´B¸B¼BÀBÄBÈBÙ59c6be96e-38f7-11f0-2d30-a71f02755abc/ae6d04b38d0be15f„¦layout„¥xaxis¥title¤textªTime steps¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥yaxis‚¤type£log¥title¤text±Steps Per Episode¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤data‘ƒ¡yÈ@à—D€ÎCC6C€ÁC?CèA…C$BBÆB4BBLBÀBœBüBøAC?C@CACBCCCDCECFCGCHCICJCKCLCMCNCOCPCQCRCSCTCUCVCWCXCYCZC[C\C]C^C_C`CaCbCcCdCeCfCgChCiCjCkClCmCnCoCpCqCrCsCtCuCvCwCxCyCzC{C|C}C~CC€C€€CC€C‚C€‚CƒC€ƒC„C€„C…C€…C†C€†C‡C€‡CˆC€ˆC‰C€‰CŠC€ŠC‹C€‹CŒC€ŒCC€CŽC€ŽCC€CC€C‘C€‘C’C€’C“C€“C”C€”C•C€•C–C€–C—C€—C˜C€˜C™C€™CšC€šC›C€›CœC€œCC€CžC€žCŸC€ŸC C€ C¡C€¡C¢C€¢C£C€£C¤C€¤C¥C€¥C¦C€¦C§C€§C¨C€¨C©C€©CªC€ªC«C€«C¬C€¬CC€C®C€®C¯C€¯C°C€°C±C€±C²C€²C³C€³C´C€´CµC€µC¶C€¶C·C€·C¸C€¸C¹C€¹CºC€ºC»C€»C¼C€¼C½C€½C¾C€¾C¿C€¿CÀC€ÀCÁC€ÁCÂC€ÂCÃC€ÃCÄC€ÄCÅC€ÅCÆC€ÆCÇC€ÇCÈCÙ59c6be96e-38f7-11f0-2d30-a71f02755abc/d8c715e8e34d7d99„¦layout‡¦xaxis1ˆ§yanchor¦bottom¨tickvals× A¥title„¤font¤sizeÊA ¨standoffÊ?€¤text¹# Cars at second locationªautomarginÃ¦domain×ffæ>©linewidthÊ@¦mirrorÃ¦anchor¢y1©linecolor¥white¦yaxis1‡¨tickvals× A¥title„£pad¡lÊ¨standoffÊ?€¤text¸# Cars at first locationªautomarginÃ¦domain×€?©linewidthÊ@¦mirrorÃ¦anchor¢x1©linecolor¥white¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦xaxis2ˆ§yanchor¦bottom¨tickvals× A¥title„¤font¤sizeÊA ¨standoffÊ?€¤text¹# Cars at second locationªautomarginÃ¦domain×ÍÌ?€?©linewidthÊ@¦mirrorÃ¦anchor¢y2©linecolor¥white¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¦yaxis2‡¨tickvals× A¥title„£pad¡lÊ¨standoffÊ?€¤text¸# Cars at first locationªautomarginÃ¦domain×€?©linewidthÊ@¦mirrorÃ¦anchor¢x2©linecolor¥white«annotations’‰§yanchor¦bottom§xanchor¦center¡yÊ?€¤font¤sizeÊA€©showarrowÂ¤yref¥paper¤textª$\pi_{30}$¤xref¥paper¡xÊ>fff‰§yanchor¦bottom§xanchor¦center¡yÊ?€¤font¤sizeÊA€©showarrowÂ¤yref¥paper¤text®$v_{\pi_{30}}$¤xref¥paper¡xÊ?Fff¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤data’‰¨colorbar©thicknessÊ@¥xaxis¢x1¡yÇT€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A¤type§heatmapªcolorscale¤RdBu¥yaxis¢y1¡zÜÇT€?@@@@€@€@ @ @ @ @ @ @ @ @ @ @ÇT€?€?@@@@@€@€@€@€@ @ @ @ @ @ @ @ÇT€?@@@@@@@@@@€@€@€@€@€@€@ @ @ÇT€?€?@@@@@@@@@@@@@@@@€@€@€@ÇT€?€?€?€?@@@@@@@@@@@@@@ÇT€?€?€?€?€?€?@@@@@@ÇT€?€?€?€?@@@ÇT€¿€?@@ÇT€¿€¿€?€?@ÇTÀ€¿€?@ÇTÀ€¿€¿€?@ÇTÀÀ€¿€?€?ÇT@ÀÀ€¿€?ÇT@ÀÀ€¿€?ÇT@ÀÀ€¿€¿€?ÇT@ÀÀÀ€¿€?ÇT@À@ÀÀ€¿ÇT€À@ÀÀ€¿ÇT€À@ÀÀ€¿€¿ÇT€À@ÀÀÀ€¿ÇT€À@À@ÀÀ€¿©transposeÃ¡xÇT€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A‰¨colorbar©thicknessÊ@¥xaxis¢x2¡yÇT€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A A¤type§heatmapªcolorscale§Bluered¥yaxis¢y2¡zÜÇT‘ÒC¥×CæJÜC µàCTÁäCßyèCðîëCD0ïCHòCs;õC™ øCÒ¿úC§RýC²ÆÿCó D‡(D<1D7$DfûD‹¯Dã:DÇT š×C&ŠÜCnGáC{±åC½éCµuíCYêðC;+ôCwB÷Cl5úCúýC§¸ÿC}%D@_D‰Dõ£Ds¬D3ŸD5vD-*DfµDÇT¥‡ÜCÔwáCð4æC…žêC¼©îC:`òCÓõC¡ùC_&üC”ÿC·òD0JD’D€ÊDóDÊD#Dß DçÛ D D¦DÇT9UáCbEæC ëCbjïCmsóCç%÷Cï“úCÕÌýC‹mD6âDFDšDtÞDDˆ9DœO DT DùBD„D†ÇDiP DÇT'óåC*ãêC÷žïCôC& øCT¶ûCŽÿC%Dp¦DøDˆrD7ÀDjþDƒ- D—M D ^D]DîF DÚDãÂDžHDÇT³XêCpHïCôCføC>eüCrD,±D$ADcºD: DÛtD–¹Déî D_D-DU5 Dv,DèD ×D7~DPÿDÇT1„îC‘sóC«,øCí‹üCBD»DŸ´D<D‹«DøDÕP D¬Š D3µDÑDNÞ D~ÜDËÉDê¢D…bDKD„}DÇTxòCg÷CpüCÒ<D@5DYüDÚšDrD©~DíÎ D D;DÅY DëiDžkDx^D°@D;D%ÅD\DÑDÇT%8öCÀ&ûC8ÜÿC–D>DÌÏDgDµÜDÁ7 Dè|D“¯DŽÑ D5äDNèDÞDóÄD|›DØ^D} Dä˜DJDÇTLÈùCp¶þCø´D.ÞDÏD*‹DW D–ˆ DëÙD“ Dÿ;D]RDHYDQDy;D‚DXáDF™D€:DÓ¿Dý%DÇT,ýCÇD‚eDVŒDXyD0 D…¹ DÇDg DÒ—D®´D"ÀDÕ»D·¨D‡D¥VDDðÂDÎYD<ÖDB5DÇTn3DÔ©DlD¤%Dœ DÀ DéBD” DjàDâDçDD) DFïD¾ÂDA‡D·;DøÝDÔjD¶ÞDß6DÇTÂ½DË3DŠDŒ«DF D8<D³¸ DDêFD¯eDqoDífD$ND&D6ïDL©DWSDëDçnDÚDQ,DÇTg6Dý«D¦D> D€ÿDã¥ DþDRkD{›D²D!³D–¡DhD¼MD Dg½D¬]DhìDêfDîÊDkDÇTWžDoDjf DëD]] D þD²mD)¶DöÞDÂíDžæDÌDŠ¡DÒfDDÄD3[DáDKSD°DÀõDÇTRöDýjD)¼ DAÔDÆªD{ED§®DAðDóDCD^ D6èDÜ´D¿qDXD¾½DILDÂÉDY4D?ŠDŸÊDÇT®>DÜ² D$D’DõçDx|DßDæD4DÐ4D‡D¿ôDŸ¹D{nDDaªDî0D¦¦D" D¯YD•DÇTÓvDzê DÛ7 D[HDdDy¢DYþDj2D^FD±?D]"DrñDû®Df\DxúDT‰D€DwDÔDÓDŽTDÇTgœ D“DúZDSgDÑ-DLµD\ D±7DEDÊ7DéDIÜD“D›9DÆÐD¿XD6ÑD]9DGDüÔD~DÇT…ª D* D†fD¾nD–/Dd°D þDc%Dw,DDïDb±DüaD`D\“D6D¬‡DêDÈ;D|DçªDÇT<™DaDÊRDWDDD¿ŒD~ÔDõDeöD‚ÝD-®DkDPDT±Dö<Dˆ¹Dä&D€„D¶ÑDD:D©transposeÃ¡xÇT€?@@@€@ @À@à@AA A0A@APA`ApA€AˆAA˜A AÙ59c6be96e-38f7-11f0-2d30-a71f02755abc/24aa7574d5705350„¦layout„¥xaxis¥title¤text¥State¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥titleºEstimated Value with TD(0)¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤data•…¤line¥color¥black¡yÇ«ª*>«ªª>?«ª*?UUU?¤type§scatter¤name«True values¡x•¡A¡B¡C¡D¡E„¡yÇ?????¤type§scatter¤nameª0 episodes¡x•¡A¡B¡C¡D¡E„¡yÇffæ>????¤type§scatter¤nameª1 episodes¡x•¡A¡B¡C¡D¡E„¡yÇi]¤>Hfå>Ù?Õ¯?ÈŒ*?¤type§scatter¤name«10 episodes¡x•¡A¡B¡C¡D¡E„¡yÇ´ªH>§ýÊ>† ?Ü%M?Eói?¤type§scatter¤name¬100 episodes¡x•¡A¡B¡C¡D¡EÙ59c6be96e-38f7-11f0-2d30-a71f02755abc/ac757a3486dcd2e1„¦layoutŠ¨autosizeÂ§paddingÊ¥xaxis‹¨showlineÃ©gridcolor¥black¨tickvalsÇ(€?@@@€@ @À@à@AA A¥range×€?0A¨ticktextÇ(€?€?€?@@€?©linecolor¥black¨showgridÃ¨gridwithÊ?€¨zerolineÃ¥title«Wind Values¦mirrorÃpaper_bgcolor°rgba(0, 0, 0, 0)¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦heightÊCR¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥yaxis‰¨showgridÃ¨showlineÃ©gridcolor¥black©gridwidthÊ?€¨tickvalsÇ€?@@@€@ @À@à@¥range×€?A¦mirrorÃ¨ticktext—       ©linecolor¥black¥titleƒ¤font¤sizeÊA`¤text¾Sarsa policy
Path Example¡xÊ?¥widthÊC–¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤dataš‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖ@¤type§scatter¤text‘¡S¡xÖÀ?‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖ@¤type§scatter¤text‘¡G¡xÖA‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×@`@¤type§scatter¤name¬Optimal Path¡x×À?À?‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×`@ @¤type§scatter¤name¬Optimal Path¡x×À? @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @À?¤type§scatter¤name¬Optimal Path¡x× @`@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×`@@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x×@°@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x×°@Ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @ @¤type§scatter¤name¬Optimal Path¡x×Ð@ð@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @@¤type§scatter¤name¬Optimal Path¡x×ð@AÙ59c6be96e-38f7-11f0-2d30-a71f02755abc/b5c0b7878012e9e3„¦layoutŠ¨autosizeÂ§paddingÊ¥xaxis‹¨showlineÃ©gridcolor¥black¨tickvalsÇ€?@@@¥range×€?€@¨ticktextÇ(€?€?€?@@€?©linecolor¥black¨showgridÃ¨gridwithÊ?€¨zerolineÃ¥title«Wind Values¦mirrorÃpaper_bgcolor°rgba(0, 0, 0, 0)¨template‚¦layoutÞ©coloraxis¨colorbar‚¥ticks ¬outlinewidthÊ¥xaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥white©hovermode§closestpaper_bgcolor¥white£geo†©showlakesÃ¨showlandÃ©landcolor§#E5ECF6§bgcolor¥white¬subunitcolor¥white©lakecolor¥whiteªcolorscaleƒªsequentialš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921©diverging›’Ê§#8e0152’Ê=ÌÌÍ§#c51b7d’Ê>LÌÍ§#de77ae’Ê>™™š§#f1b6da’Ê>ÌÌÍ§#fde0ef’Ê?§#f7f7f7’Ê?™š§#e6f5d0’Ê?333§#b8e186’Ê?LÌÍ§#7fbc41’Ê?fff§#4d9221’Ê?€§#276419¯sequentialminusš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921¥yaxis‡©gridcolor¥whitezerolinewidthÊ@¥title¨standoffÊAp¥ticks zerolinecolor¥whiteªautomarginÃ©linecolor¥whiteshapedefaults¤line¥color§#2a3f5fªhoverlabel¥align¤left¦mapbox¥style¥light¥polarƒ«angularaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6ªradialaxisƒ©gridcolor¥white¥ticks ©linecolor¥white¯autotypenumbers¦strict¤font¥color§#2a3f5f§ternary„¥aaxisƒ©gridcolor¥white¥ticks ©linecolor¥white§bgcolor§#E5ECF6¥caxisƒ©gridcolor¥white¥ticks ©linecolor¥white¥baxisƒ©gridcolor¥white¥ticks ©linecolor¥white²annotationdefaultsƒ©arrowheadÊªarrowwidthÊ?€ªarrowcolor§#2a3f5f¬plot_bgcolor§#E5ECF6¥title¡xÊ=LÌÍ¥sceneƒ¥xaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥zaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¥yaxis‡©gridcolor¥white©gridwidthÊ@¯backgroundcolor§#E5ECF6¥ticks ®showbackgroundÃzerolinecolor¥white©linecolor¥white¨colorwayš§#636efa§#EF553B§#00cc96§#ab63fa§#FFA15A§#19d3f3§#FF6692§#B6E880§#FF97FF§#FECB52¤dataÞ®scatterpolargl‘‚¤type®scatterpolargl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦carpet‘ƒ¥baxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¤type¦carpet¥aaxis…©gridcolor¥white¬endlinecolor§#2a3f5f®minorgridcolor¥white®startlinecolor§#2a3f5f©linecolor¥white¬scatterpolar‘‚¤type¬scatterpolar¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©parcoords‘‚¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©parcoords§scatter‘‚¤type§scatter¦marker¨colorbar‚¥ticks ¬outlinewidthÊ²histogram2dcontour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type²histogram2dcontourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921§contour‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§contourªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattercarpet‘‚¤typescattercarpet¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦mesh3d‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤type¦mesh3d§surface‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§surfaceªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921scattermapbox‘‚¤typescattermapbox¦marker¨colorbar‚¥ticks ¬outlinewidthÊªscattergeo‘‚¤typeªscattergeo¦marker¨colorbar‚¥ticks ¬outlinewidthÊ©histogram‘‚¤type©histogram¦marker¨colorbar‚¥ticks ¬outlinewidthÊ£pie‘‚¤type£pieªautomarginÃªchoropleth‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typeªchoropleth©heatmapgl‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type©heatmapglªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921£bar‘„¤type£bar§error_y¥color§#2a3f5f§error_x¥color§#2a3f5f¦marker¤line‚¥color§#E5ECF6¥widthÊ?§heatmap‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type§heatmapªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921contourcarpet‘‚¨colorbar‚¥ticks ¬outlinewidthÊ¤typecontourcarpet¥table‘ƒ¤type¥table¦header‚¤line¥color¥white¤fill¥color§#C8D4E3¥cells‚¤line¥color¥white¤fill¥color§#EBF0F8©scatter3d‘ƒ¤line¨colorbar‚¥ticks ¬outlinewidthÊ¤type©scatter3d¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¨barpolar‘‚¤type¨barpolar¦marker¤line‚¥color§#E5ECF6¥widthÊ?©scattergl‘‚¤type©scattergl¦marker¨colorbar‚¥ticks ¬outlinewidthÊ«histogram2d‘ƒ¨colorbar‚¥ticks ¬outlinewidthÊ¤type«histogram2dªcolorscaleš’Ê§#0d0887’Ê=ãŽ9§#46039f’Ê>cŽ9§#7201a8’Ê>ªª«§#9c179e’Ê>ãŽ9§#bd3786’Ê?8ä§#d8576b’Ê?*ª«§#ed7953’Ê?Gr§#fb9f3a’Ê?cŽ9§#fdca26’Ê?€§#f0f921®scatterternary‘‚¤type®scatterternary¦marker¨colorbar‚¥ticks ¬outlinewidthÊ¦heightÊCH¦margin„¡lÊBH¡bÊBH¡rÊBH¡tÊBp¥yaxis‰¨showgridÃ¨showlineÃ©gridcolor¥black©gridwidthÊ?€¨tickvalsÇ€?@@@¥range×€?€@¦mirrorÃ¨ticktext“   ©linecolor¥black¥titleƒ¤font¤sizeÊA`¤textÙ Optimal policy
path example¡xÊ?¥widthÊCH¦config…¨showLinkÂ¨editableÂªresponsiveÃªstaticPlotÂªscrollZoomÃ¦frames¤data–‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖÀ?¤type§scatter¤text‘¡S¡xÖÀ?‡ªshowlegendÂ¤mode¤text¬textposition¤left¡yÖ`@¤type§scatter¤text‘¡G¡xÖ`@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x×À? @‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À?À?¤type§scatter¤name¬Optimal Path¡x× @`@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y×À? @¤type§scatter¤name¬Optimal Path¡x×`@`@‡ªshowlegendÂ¤mode¥lines¤line¥color¤blue¡y× @`@¤type§scatter¤name¬Optimal Path¡x×`@`@¥nbpkgŠ¯install_time_nsÎóž4t¬instantiatedÃ²installed_versionsŠSerialization¦stdlibªStatistics¦stdlib©StatsBase¦0.34.3«Transducers¦0.4.84LinearAlgebra¦stdlib§PlutoUI¦0.7.60°HypertextLiteral¥0.9.5¨Latexify¦0.16.5¬LaTeXStrings¥1.3.1«PlutoPlotly¥0.4.6°terminal_outputsŒªStatisticsÚ@ [0m[1mResolving...[22m [90m===[39m [32m[1m No Changes[22m[39m to `/tmp/jl_wBFXpg/Project.toml` [32m[1m No Changes[22m[39m to `/tmp/jl_wBFXpg/Manifest.toml` [0m[1mInstantiating...[22m [90m===[39m [0m[1mPrecompiling...[22m [90m===[39m [32m[1m Activating[22m[39m project at `/tmp/jl_wBFXpg`«TransducersÚ@ [0m[1mResolving...[22m [90m===[39m [32m[1m No Changes[22m[39m to `/tmp/jl_wBFXpg/Project.toml` [32m[1m No Changes[22m[39m to `/tmp/jl_wBFXpg/Manifest.toml` [0m[1mInstantiating...[22m [90m===[39m [0m[1mPrecompiling...[22m [90m===[39m [32m[1m Activating[22m[39m project at `/tmp/jl_wBFXpg`§PlutoUIÚ@ [0m[1mResolving...[22m [90m===[39m [32m[1m No Changes[22m[39m to `/tmp/jl_wBFXpg/Project.toml` [32m[1m No Changes[22m[39m to `/tmp/jl_wBFXpg/Manifest.toml` [0m[1mInstantiating...[22m [90m===[39m [0m[1mPrecompiling...[22m [90m===[39m [32m[1m Activating[22m[39m project at `/tmp/jl_wBFXpg`SerializationÚ@ [0m[1mResolving...[22m [90m===[39m [32m[1m No Changes[22m[39m to `/tmp/jl_wBFXpg/Project.toml` [32m[1m No Changes[22m[39m to `/tmp/jl_wBFXpg/Manifest.toml` [0m[1mInstantiating...[22m [90m===[39m [0m[1mPrecompiling...[22m [90m===[39m [32m[1m Activating[22m[39m project at `/tmp/jl_wBFXpg`LinearAlgebraÚ@ [0m[1mResolving...[22m [90m===[39m [32m[1m No Changes[22m[39m to `/tmp/jl_wBFXpg/Project.toml` [32m[1m No Changes[22m[39m to `/tmp/jl_wBFXpg/Manifest.toml` [0m[1mInstantiating...[22m [90m===[39m [0m[1mPrecompiling...[22m [90m===[39m [32m[1m Activating[22m[39m project at `/tmp/jl_wBFXpg`¬LaTeXStringsÚ@ [0m[1mResolving...[22m [90m===[39m [32m[1m No Changes[22m[39m to `/tmp/jl_wBFXpg/Project.toml` [32m[1m No Changes[22m[39m to `/tmp/jl_wBFXpg/Manifest.toml` [0m[1mInstantiating...[22m [90m===[39m [0m[1mPrecompiling...[22m [90m===[39m [32m[1m Activating[22m[39m project at `/tmp/jl_wBFXpg`©StatsBaseÚ@ [0m[1mResolving...[22m [90m===[39m [32m[1m No Changes[22m[39m to `/tmp/jl_wBFXpg/Project.toml` [32m[1m No Changes[22m[39m to `/tmp/jl_wBFXpg/Manifest.toml` [0m[1mInstantiating...[22m [90m===[39m [0m[1mPrecompiling...[22m [90m===[39m [32m[1m Activating[22m[39m project at `/tmp/jl_wBFXpg`¨LatexifyÚ@ [0m[1mResolving...[22m [90m===[39m [32m[1m No Changes[22m[39m to `/tmp/jl_wBFXpg/Project.toml` [32m[1m No Changes[22m[39m to `/tmp/jl_wBFXpg/Manifest.toml` [0m[1mInstantiating...[22m [90m===[39m [0m[1mPrecompiling...[22m [90m===[39m [32m[1m Activating[22m[39m project at `/tmp/jl_wBFXpg`¤BaseÚ@ [0m[1mResolving...[22m [90m===[39m [32m[1m No Changes[22m[39m to `/tmp/jl_wBFXpg/Project.toml` [32m[1m No Changes[22m[39m to `/tmp/jl_wBFXpg/Manifest.toml` [0m[1mInstantiating...[22m [90m===[39m [0m[1mPrecompiling...[22m [90m===[39m [32m[1m Activating[22m[39m project at `/tmp/jl_wBFXpg`ªnbpkg_syncÚ@ [0m[1mResolving...[22m [90m===[39m [32m[1m No Changes[22m[39m to `/tmp/jl_wBFXpg/Project.toml` [32m[1m No Changes[22m[39m to `/tmp/jl_wBFXpg/Manifest.toml` [0m[1mInstantiating...[22m [90m===[39m [0m[1mPrecompiling...[22m [90m===[39m [32m[1m Activating[22m[39m project at `/tmp/jl_wBFXpg`°HypertextLiteralÚ@ [0m[1mResolving...[22m [90m===[39m [32m[1m No Changes[22m[39m to `/tmp/jl_wBFXpg/Project.toml` [32m[1m No Changes[22m[39m to `/tmp/jl_wBFXpg/Manifest.toml` [0m[1mInstantiating...[22m [90m===[39m [0m[1mPrecompiling...[22m [90m===[39m [32m[1m Activating[22m[39m project at `/tmp/jl_wBFXpg`«PlutoPlotlyÚ@ [0m[1mResolving...[22m [90m===[39m [32m[1m No Changes[22m[39m to `/tmp/jl_wBFXpg/Project.toml` [32m[1m No Changes[22m[39m to `/tmp/jl_wBFXpg/Manifest.toml` [0m[1mInstantiating...[22m [90m===[39m [0m[1mPrecompiling...[22m [90m===[39m [32m[1m Activating[22m[39m project at `/tmp/jl_wBFXpg`§enabledÃ·restart_recommended_msgÀ´restart_required_msgÀbusy_packages¶waiting_for_permissionÂÙ,waiting_for_permission_but_probably_disabledÂ«cell_inputsÞáÙ$8ddf6b9d-d76d-401f-96ad-2a0b5c114fa4„§cell_idÙ$8ddf6b9d-d76d-401f-96ad-2a0b5c114fa4¤codeÚ¬function create_noisy_gridworld_mdp(mdp::MDP_TD, min_reward, max_reward) #this only works when the mdp is deterministic. add a version for the stochastic wind example ptf = zeros(Float32, length(mdp.states), 3, length(mdp.actions), length(mdp.states)) for s in mdp.states i_s = mdp.statelookup[s] if mdp.isterm(s) ptf[i_s, 1, :, i_s] .= 1.0f0 else for a in mdp.actions (r, sâ€²) = mdp.step(s, a) i_a = mdp.actionlookup[a] i_sâ€² = mdp.statelookup[sâ€²] i_s = mdp.statelookup[s] ptf[i_sâ€², 2, i_a, i_s] = 0.5f0 ptf[i_sâ€², 3, i_a, i_s] = 0.5f0 end end end FiniteMDP(mdp.states, mdp.actions, [0.0f0, min_reward, max_reward], ptf) end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$5290ae65-6f56-4849-a842-fe347315c6dc„§cell_idÙ$5290ae65-6f56-4849-a842-fe347315c6dc¤codeÚèmd""" ## 6.2 Advantages of TD Prediction Methods TD methods can learn before an episode terminates, so this is an advantage in environments that have very long episodes. Also, in continuing problems, Monte Carlo methods may not be suitable at all because there is no termination condition. Furthermore, if we consider off-policy learning, Monte Carlo methods must ignore returns if exploratory actions (ones never taken by the target policy) are taken later in the episode whereas TD methods could learn from individual steps that are not exploratory regardless of what happens later on. For any fixed policy $v_\pi$ TD(0) has been proved to converge to $v_\pi$ in the mean for a constant step-size parameter if it is sufficiently small, and with probability 1 if the step-size parameter decreases according to the usual stochastic approximation conditions (2.7). Since both TD and Monte Carlo methods converge, one natural question is which converges faster, which makes more efficient use of limited data? There is no mathematical proof to this question, nor is it clear how to even pose it formally; however, TD methods have usually been found to converge faster than constant-Î± MC methods on stochastic tasks, as illustrated in Example 6.2. """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$b3d4117f-7db4-43a6-8427-c08f3542d71f„§cell_idÙ$b3d4117f-7db4-43a6-8427-c08f3542d71f¤codeÙ1poisson(n, Î») = exp(-Î») * (Î»^n) / factorial(n)¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$3ed12c33-ab0a-49b1-b9e7-c4305ba35767„§cell_idÙ$3ed12c33-ab0a-49b1-b9e7-c4305ba35767¤codeÚZ#take a step in the environment from state s using policy Ï€ and generate the subsequent action selection as well function init_step(mdp::MDP_TD{S, A, F, G, H}, Ï€::Matrix{T}, s::S) where {S, A, F<:Function, G<:Function, H<:Function, T<:Real} i_s = mdp.statelookup[s] i_a = sample_action(Ï€, i_s) a = mdp.actions[i_a] return (i_s, i_a, a) end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$209881b3-3ac8-490e-97bd-fa5ae24a39f5„§cell_idÙ$209881b3-3ac8-490e-97bd-fa5ae24a39f5¤codeÚ#update the value function with the TD0 method using a single episode function update_value!(V::Vector{T}, ::TD0, Î±::T, Î³::T, mdp::MDP_TD{S, A, F, G, H}, states::Vector{S}, actions::Vector{A}, rewards::Vector{T}) where {T<:AbstractFloat, S, A, F<:Function, G<:Function, H<:Function} l = length(states) err = zero(T) for i in 1:l-1 s = states[i] sâ€² = states[i+1] i_s = mdp.statelookup[s] v_old = V[i_s] i_sâ€² = mdp.statelookup[sâ€²] v_new = v_old + Î±*(rewards[i] + Î³*V[i_sâ€²] - v_old) err = max(err, calc_error(v_old, v_new)) V[i_s] = v_new end #perform update for terminal state s = last(states) i_s = mdp.statelookup[s] v_old = V[i_s] v_new = v_old + Î±*(rewards[l] - v_old) err = max(err, calc_error(v_old, v_new)) V[i_s] = v_new return err end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$6e06bd39-486f-425a-bbca-bf363b58988c„§cell_idÙ$6e06bd39-486f-425a-bbca-bf363b58988c¤codeÚmd""" ## 6.6 Expected Sarsa Consider the learning algorithm that is just like Q-learning except that intsead of the maximization over next state-action pairs it uses the expected value, taking into account how likely each action is under the current policy. That is consider the algorithm with the update rule $Q(S_t, A_t) \leftarrow Q(S_t, A_t) + \alpha \left [ R_{t+1} + \gamma \text{E}_\pi [Q(S_{t+1}, A_{t+1})|S_{t+1}] - Q(S_t, A_t) \right ]$ $= Q(S_t, A_t) + \alpha \left [ R_{t+1} + \gamma \sum_a \pi(a|S_{t+1})Q(S_{t+1}, a) - Q(S_t, A_t) \right ]$ but that otherwise follows the scheme of Q-learning. Given the next state, $S_{t+1}$, this algorithm moves *deterministically* in the same direction as Sarsa moves *in expectation*, and accordingly it is called *Expected Sarsa*. Although more computationally complex than Sarsa, it eliminates the variance due to the random selection of $A_{t+1}$ In general Expected Sarsa might use a policy different from the target policy Ï€ to generate behavior in which case it becomes an off-policy algorithm. For example, supppose Ï€ is the greedy policy while behavior is more exploratory; then Expected Sarsa is exactly Q-learning. In this sense Expected Sarsa subsumes and generalizes Q-learning while reliably improving over Sarsa. """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$e039a5be-4b59-4023-be97-2d1de970be27„§cell_idÙ$e039a5be-4b59-4023-be97-2d1de970be27¤codeÙ,md""" ### Double Learning Implementation """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$2786101e-d365-4d6a-8de7-b9794499efb4„§cell_idÙ$2786101e-d365-4d6a-8de7-b9794499efb4¤codeÚ‰function example_6_2(;l = 5, max_episodes = 100, nruns = 100, vinit = 0.5f0) mrp = make_mrp(l = l) Ï€ = make_random_policy(mrp) true_values = collect(1:l) ./ (l+1) get_rw_names(l) = string.(Iterators.take('A':'Z', l) |> collect) (_, td0_est) = tabular_TD0_pred_V(Ï€, mrp, 0.1f0, 1.0f0; num_episodes = 100, vinit = 0.5f0, save_states = collect(1:l)) traces = [scatter(x = get_rw_names(l), y = td0_est[:, n], name = "$(n-1) episodes") for n in [1, 2, 11, 101]] tv_trace = scatter(x = get_rw_names(l), y = true_values, name = "True values", line_color="black") p1 = plot([tv_trace; traces], Layout(title = "Estimated Value with TD(0)", xaxis_title = "State")) calc_rms(v_saves) = [sqrt(mean((v .- true_values) .^2)) for v in eachcol(v_saves)] run_estimate(f, Î±, n) = f(Ï€, mrp, Î±, 1.0f0; num_episodes = n, vinit = vinit, save_states = collect(1:l)) td_Î±s = [0.05f0, 0.1f0, 0.15f0] mc_Î±s = 0.01f0:0.01f0:0.04f0 |> collect td_est = [mean([calc_rms(last(run_estimate(tabular_TD0_pred_V, Î±, max_episodes))) for _ in 1:nruns]) for Î± in td_Î±s] mc_est = [mean([calc_rms(last(run_estimate(monte_carlo_pred_V, Î±, max_episodes))) for _ in 1:nruns]) for Î± in mc_Î±s] td_traces = [scatter(x = collect(1:max_episodes), y = td_est[i], name = "$(i == 1 ? "TD" : "") Î± = $(td_Î±s[i])", line_color = "rgba(0, 0, 255, $(i/3))") for i in eachindex(td_est)] mc_traces = [scatter(x = collect(1:max_episodes), y = mc_est[i], name = "$(i == 1 ? "MC" : "") Î± = $(mc_Î±s[i])", line_color = "rgba(255, 0, 0, $(i/5))") for i in eachindex(mc_est)] p2 = plot([td_traces; mc_traces], Layout(xaxis_title = "Walks / Episodes", title = "Empirical RMS error, averaged over states")) @htl("""

$p1 $p2

$(md"""$\cdots \:$""")

$(md"""$S_t$""")

$(md"""$A_t$""")

$(md"""$R_{t+1}$""")

$(md"""$S_{t+1}$""")

$(md"""$A_{t+1}$""")

$(md"""$R_{t+2}$""")

$(md"""$S_{t+2}$""")

$(md"""$A_{t+2}$""")

$(md"""$R_{t+3}$""")

$(md"""$S_{t+3}$""")

$(md"""$\:\cdots$""")

""")¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$2651af2d-56a8-4f7e-a56a-45cabd665c72„§cell_idÙ$2651af2d-56a8-4f7e-a56a-45cabd665c72¤codeÙ4 max_bias_visualization_comp(;max_visual_params2...)¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$620a6426-cb29-4010-997b-aa4f9d5f8fb0„§cell_idÙ$620a6426-cb29-4010-997b-aa4f9d5f8fb0¤codeÙebegin abstract type BatchMethod end struct TD0 <: BatchMethod end struct MC <: BatchMethod end end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$889611fb-7dac-4769-9251-9a90e3a1422f„§cell_idÙ$889611fb-7dac-4769-9251-9a90e3a1422f¤codeÙSfunction statestyle(s) """ .circlestate.$s::before { content: '$s'; } """ end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$5455fc97-55cb-4b0e-a3be-9433ccc96fc0„§cell_idÙ$5455fc97-55cb-4b0e-a3be-9433ccc96fc0¤codeÙámd""" Number of States: $(@bind nstates Slider(3:10, default = 5, show_value=true)) Animation Interval (s): $(@bind delay Slider(0.1:0.1:1.0, default = 0.5, show_value=true)) $(@bind start_mrp Button("New Random Walk")) """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$24a441c8-7aaf-4642-b245-5e1201456d67„§cell_idÙ$24a441c8-7aaf-4642-b245-5e1201456d67¤codeÚûfunction check_policy(Ï€::Matrix{T}, mdp::MDP_TD) where {T <: AbstractFloat} #checks to make sure that a policy is defined over the same space as an MDP (n, m) = size(Ï€) num_actions = length(mdp.actions) num_states = length(mdp.states) @assert n == num_actions "The policy distribution length $n does not match the number of actions in the mdp of $(num_actions)" @assert m == num_states "The policy is defined over $m states which does not match the mdp state count of $num_states" return nothing end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$1e45a661-c2e1-40c2-b27b-5f80f95efdab„§cell_idÙ$1e45a661-c2e1-40c2-b27b-5f80f95efdab¤codeÙÂshow_gridworld_policy_value(stochastic_gridworld, q_learning(stochastic_gridworld, 0.1f0, 1.0f0; num_episodes = 2000); action_display = king_action_display, policy_display = display_king_policy)¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$21fbdc3b-4444-4f56-9934-fb58e184d685„§cell_idÙ$21fbdc3b-4444-4f56-9934-fb58e184d685¤codeÙNmd""" Load existing figure: $(@bind fig_6_3_load CheckBox(default = true)) """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$30e663da-282c-42ff-8171-dbe3c5c467c6„§cell_idÙ$30e663da-282c-42ff-8171-dbe3c5c467c6¤codeÚâfunction makepolicyvalueplots(mdp::CompleteMDP, v::Vector{T}, Ï€::Matrix{T}, iter::Integer; policycolorscale = "RdBu", valuecolorscale = "Bluered", kwargs...) where T <: Real (policymap, valuemap) = makepolicyvaluemaps(mdp, v, Ï€) layout = Layout(autosize = false, height = 220, width = 230, paper_bgcolor = "rgba(30, 30, 30, 1)", margin = attr(l = 0, t = 0, r = 0, b = 0, padding = 0), xaxis = attr(title = attr(text = "# Cars at second location", font_size = 10, standoff = 1, automargin = true), tickvals = [0, 20], linecolor = "white", mirror = true, linewidth = 2, yanchor = "bottom"), yaxis = attr(title = attr(text = "# Cars at first location", standoff = 1, automargin = true, pad_l = 0), tickvals = [0, 20], linecolor = "white", mirror = true, linewidth = 2), font_color = "gray", font_size = 9) function makeplot(z, colorscale; kwargs...) tr = heatmap(;x = 0:20, y = 0:20, z = z, colorscale = colorscale, colorbar_thickness = 2) plot(tr, layout) end vtitle = L"v_{\pi_{%$(iter-1)}}" policyplot = relayout(makeplot(policymap, policycolorscale), (title = attr(text = latexify("Ï€_$(iter-1)"), x = 0.5, xanchor = "center", font_size = 20, automargin = true, yref = "paper", yanchor = "bottom", pad_b = 10))) valueplot = relayout(makeplot(valuemap, valuecolorscale), (title = attr(text = vtitle, x = 0.5, xanchor = "center", font_size = 20, automargin = true, yref = "paper", yanchor = "bottom", pad_b = 10))) (Ï€ = relayout(policyplot, kwargs), v = relayout(valueplot, kwargs)) end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$9651f823-e1cd-4e6e-9ce0-be9ea1c3f0a4„§cell_idÙ$9651f823-e1cd-4e6e-9ce0-be9ea1c3f0a4¤codeÚfunction display_king_policy(v::Vector{T}; scale = 1.0) where T<:AbstractFloat @htl("""

""") end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$84a71bf8-0d66-42cd-ac7b-589d63a16eda„§cell_idÙ$84a71bf8-0d66-42cd-ac7b-589d63a16eda¤codeÙåfunction create_greedy_policy(Q::Matrix{T}; c = 1000, Ï€ = copy(Q)) where T<:Real vhold = zeros(T, size(Q, 1)) for j in 1:size(Q, 2) vhold .= Q[:, j] make_greedy_policy!(vhold; c = c) Ï€[:, j] .= vhold end return Ï€ end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$c9f7646a-ec01-4d90-9215-5027b7c1c885„§cell_idÙ$c9f7646a-ec01-4d90-9215-5027b7c1c885¤codeÙ¡md""" ### Q-learning Instability at Higher Learning Rate Learning Rate $\alpha$ $(@bind Î±_6_8 Slider(0.01f0:0.01f0:0.5f0, default = 0.3f0, show_value=true)) """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$8e34202a-f841-4464-9017-cd50194f7987„§cell_idÙ$8e34202a-f841-4464-9017-cd50194f7987¤codeÙŸfunction make_random_policy(mdp::MDP_TD; init::T = 1.0f0) where T <: AbstractFloat ones(T, length(mdp.actions), length(mdp.states)) ./ length(mdp.actions) end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$95245673-2c29-401e-bb4b-a39dc8172297„§cell_idÙ$95245673-2c29-401e-bb4b-a39dc8172297¤codeÚöfunction create_gridworld_mdp(width, height, start, goal, wind, actions, step_reward) mdp = make_windy_gridworld(;actions = actions, apply_wind = apply_wind, sterm = goal, start = start, xmax = width, ymax = height, winds = wind_vals, get_step_reward = () -> step_reward) ptf = zeros(Float32, length(mdp.states), 2, length(mdp.actions), length(mdp.states)) for s in mdp.states i_s = mdp.statelookup[s] if mdp.isterm(s) ptf[i_s, 1, :, i_s] .= 1.0f0 else for a in mdp.actions w = wind[s.x] (r, sâ€²) = mdp.step(s, a) i_a = mdp.actionlookup[a] i_s = mdp.statelookup[s] i_sâ€² = mdp.statelookup[sâ€²] ptf[i_sâ€², 2, i_a, i_s] = 1.0f0 end end end FiniteMDP(mdp.states, mdp.actions, [0.0f0, step_reward], ptf) end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$c34678f6-53bb-4f2a-96f0-a7b16f894ddd„§cell_idÙ$c34678f6-53bb-4f2a-96f0-a7b16f894ddd¤codeÚŠfunction show_gridworld_policy_value(mdp, results; winds = wind_vals, action_display = rook_action_display, policy_display = display_rook_policy) Q, Ï€ = results policy_display = show_grid_policy(mdp, Ï€, winds, policy_display, String(rand('A':'Z', 10)); action_display = action_display, scale = .8) value_display = show_grid_value(mdp, Q, winds, String(rand('A':'Z', 10)); action_display = action_display, scale = .8) path = plot_path(mdp, Ï€) @htl("""

$policy_display

$value_display

$path

""") end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$e4e80015-40ce-4f8a-aac7-4a9584da4baa„§cell_idÙ$e4e80015-40ce-4f8a-aac7-4a9584da4baa¤codeÙ$example_6_8(;loadfile = ex_6_8_load)¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$64fe8336-d1c2-41fe-a522-1b6f63260fc9„§cell_idÙ$64fe8336-d1c2-41fe-a522-1b6f63260fc9¤codeÙ*const Ï€_mrp = make_random_policy(mrp_6_2)¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$dea61907-d4fb-492d-b2bb-c037c7f785cb„§cell_idÙ$dea61907-d4fb-492d-b2bb-c037c7f785cb¤codeÚefunction bellman_optimal_value!(V::Vector{T}, mdp::FiniteMDP{T, S, A}, Î³::T) where {T <: Real, S, A} delt = zero(T) @inbounds @fastmath @simd for i_s in eachindex(mdp.states) maxvalue = typemin(T) @inbounds @fastmath @simd for i_a in eachindex(mdp.actions) x = zero(T) for (i_r, r) in enumerate(mdp.rewards) @inbounds @fastmath @simd for i_sâ€² in eachindex(V) x += mdp.ptf[i_sâ€², i_r, i_a, i_s] * (r + Î³ * V[i_sâ€²]) end end maxvalue = max(maxvalue, x) end delt = max(delt, abs(maxvalue - V[i_s]) / (eps(abs(V[i_s])) + abs(V[i_s]))) V[i_s] = maxvalue end return delt end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$678cad7a-1abb-4fcc-91ba-b5abcbb914cb„§cell_idÙ$678cad7a-1abb-4fcc-91ba-b5abcbb914cb¤codeÚôfunction show_grid_value(mdp, V::Vector, wind::Vector, name; action_display = king_action_display, scale = 1.0) width = maximum(s.x for s in mdp.states) height = maximum(s.y for s in mdp.states) start = mdp.state_init() termind = findfirst(mdp.isterm, mdp.states) sterm = mdp.states[termind] ngrid = width*height @htl("""

$(HTML(mapreduce(i -> """

$(round(V[i], sigdigits = 2))

""", *, eachindex(mdp.states))))

$(HTML(mapreduce(i -> """

$(wind[i])

""", *, 1:width)))

$(action_display)

Wind Values

""") end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$d299d800-a64e-4ba2-9603-efa833343405„§cell_idÙ$d299d800-a64e-4ba2-9603-efa833343405¤codeÚ function example_6_5(;mdp = windy_gridworld, num_episodes = 170, action_display = rook_action_display, policy_display = display_rook_policy, use_stochastic_dp=false) (Qstar, Ï€star, steps, rewards) = sarsa(mdp, 0.5f0, 1.0f0; Ïµinit = 0.1f0, num_episodes = num_episodes, decay_Ïµ = false) # eg = runepisode(mdp, create_greedy_policy(Qstar)) eg = runepisode(mdp, Ï€star; max_steps = 100_000) mdp_dp = use_stochastic_dp ? stochastic_gridworld_mdp_dp : create_gridworld_mdp(mdp, -1.0f0) v_dp, Ï€_dp = begin_value_iteration_v(mdp_dp, 1.0f0) path_dp = plot_path(mdp, Ï€_dp; title = "Value Iteration Policy
Path Example") policy_display_dp = show_grid_policy(mdp, Ï€_dp, wind_vals, policy_display, String(rand('A':'Z', 10)); action_display = action_display, scale = 1.0) value_display_dp = show_grid_value(mdp, v_dp[end], wind_vals, String(rand('A':'Z', 10)); action_display = action_display, scale = 1.0) start_trace = scatter(x = [1.5], y = [4.5], mode = "text", text = ["S"], textposition = "left", showlegend=false) finish_trace = scatter(x = [8.5], y = [4.5], mode = "text", text = ["G"], textposition = "left", showlegend=false) path_traces = [scatter(x = [eg[1][i].x + 0.5, eg[1][i+1].x + 0.5], y = [eg[1][i].y + 0.5, eg[1][i+1].y + 0.5], line_color = "blue", mode = "lines", showlegend=false, name = "Optimal Path") for i in 1:length(eg[1])-1] finalpath = scatter(x = [eg[1][end].x + 0.5, 8.5], y = [eg[1][end].y + 0.5, 4.5], line_color = "blue", mode = "lines", showlegend=false, name = "Optimal Path") p1 = plot(scatter(x = cumsum(steps), y = 1:num_episodes, line_color = "red"), Layout(xaxis_title = "Time steps", yaxis_title = "Episodes")) p2 = plot([start_trace; finish_trace; path_traces; finalpath], Layout(xaxis = attr(showgrid = true, showline = true, gridwith = 1, gridcolor = "black", zeroline = true, linecolor = "black", mirror=true, tickvals = 1:10, ticktext = [0, 0, 0, 1, 1, 1, 2, 2, 1, 0], range = [1, 11], title = "Wind Values"), yaxis = attr(linecolor="black", mirror = true, gridcolor = "black", showgrid = true, gridwidth = 1, showline = true, tickvals = 1:7, ticktext = fill("", 7), range = [1, 8]), width = 300, height = 210, autosize = false, padding=0, paper_bgcolor = "rgba(0, 0, 0, 0)", title = attr(text = "Sarsa policy
Path Example", font_size = 14, x = 0.5))) p3 = plot(scatter(x = 1:num_episodes, y = steps), Layout(xaxis_title = "Time steps", yaxis_title = "Steps Per Episode", yaxis_type = "log")) policy_display = show_grid_policy(mdp, Ï€star, wind_vals, policy_display, String(rand('A':'Z', 10)); action_display = action_display, scale = 1.0) value_display = show_grid_value(mdp, Qstar, wind_vals, String(rand('A':'Z', 10)); action_display = action_display, scale = 1.0) return @htl("""

$p1

$p2

$path_dp

$p3 Sarsa Solution

$policy_display $value_display

Value Iteration Solution

$policy_display_dp $value_display_dp

""") end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$c5718459-2323-4615-b2c4-f92a0fa189d9„§cell_idÙ$c5718459-2323-4615-b2c4-f92a0fa189d9¤codeÚ ?md""" Let $\mathcal{M}$ be the set of labels of estimators that maximize the expcted values of $X$: $$\mathcal{M} \doteq \left \{ j \mid \mathbb{E} \{ X_j \} = \max_i \mathbb{E} \{ X_i \} \right \}$$ Let $Max(S)$ be the set of labels of estimators that yield the maximum estimate for some set of samples S: $$Max(S) \doteq \left \{ j \mid \mu_j(S) = \max_i \mu_i(S) \right \}$$ The claim is that for all $j \in \mathcal{M}$ $$\mathbb{E} \{ \max_i \mu_i \} \geq \mathbb{E} \{ \mu_j \} = \mathbb{E} \{ X_j \} \doteq \max_i \mathbb{E} \{ X_i \} \tag{d}$$ *Proof*. Assume $j \in \mathcal{M}$, i.e. $\mu_j$ is any estimator whose expected value is the maximal. Then $$\begin{flalign} \mathbb{E} \{ \max_i \mu_i \} &= P(j \in Max) \mathbb{E} \{ \max_i \mu_i \} + P(j \notin Max) \mathbb{E} \{ \max_i \mu_i \} \\ &= P(j \in Max) \mathbb{E} \{\mu_j \vert j \in Max \} + P(j \notin Max) \mathbb{E} \{ \max_i \mu_i \} \\ &\geq P(j \in Max) \mathbb{E} \{\mu_j \vert j \in Max \} + P(j \notin Max) \mathbb{E} \{ \mu_j \vert j \notin Max \} \\ &=\mathbb{E} \{ \mu_j \} = \mathbb{E} \{X_j\} \doteq \max_i \mathbb{E} \{ X_i \} \end{flalign}$$ The third line in the proof follows from the definition of $Max$ which implies $\mathbb{E} \{ \max_i \mu_i \} \gt \mathbb{E} \{ \mu_j \vert j \notin Max \}$, for any $j$. Therefore the inequality is strict if and only if $P(j \notin Max) \gt 0$, for some $j \in \mathcal{M}$. If we do not know whether this is the case, we do not know if the inequality in $(d)$ is strict and theremore in general we write $\mathbb{E} \{ \max_i \mu_i \} \geq \max_i \mathbb{E} \{ \mu_i \}$ so the claim has been proven. Recall that $j$ is assumed to be in the set $\mathcal{M}$ meaning it has a maximizing expected value while the set $Max(S)$ contains the variables that produce the maximum estimate over some sample $S$. So, intuitively, the proof says that calculating the expected value of the maximum of the estimators will always have a positive bias, unless there is 0 probability that the variables that produces the highest estimates over a given sample are different than the true set of maximizing variables. This means that unless the underlying distribution of the variables have zero overlap (in this case the ranking of estimates will match the ranking of true expected values), there is always an expected positive bias. """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$c306867b-f137-44f2-97dd-3d10c226ca5c„§cell_idÙ$c306867b-f137-44f2-97dd-3d10c226ca5c¤codeÚ ümd""" Consider instead policy improvement with afterstate value estimates $W_\pi(y)$ where we seek to choose a policy that is greedy with respect to the afterstate values: $\pi^\prime(s) = \mathrm{argmax}_a (f_2(s, a) + W_\pi(f_1(s, a))$ where $f_1$ and $f_2$ are the deterministic functions defined above that determine which afterstate is reached from $(s, a)$ and whether any intermediate reward is received. This looks much closer to the policy improvement that occurs with $Q(s, a)$ and that is because $Q_\pi(s, a) = f_2(s, a) + W_\pi(f_1(s, a))$. So, if we use afterstates, we can have the benefits of learning the state action value function while only saving values for the afterstates. The functions $f_1$ and $f_2$ provide all the extra information needed to recover those values. Continuing the comparison to value iteration, recall that we adapted the Bellman optimality equation for the state value function to have a single update rule to estimate $V^*(s)$: $$V^*(s) = \max_a Q^*(s, a) = \max_a \sum_{r, s^\prime} p(r, s^\prime \vert s, a) (r + \gamma V^*(s^\prime))$$ We can only apply this update rule if we have $p(r, s^\prime \vert s, a)$ or if we instead estimate $Q^*$ and sample the transitions from the environment. To estimate $W^*(y)$, we need to represent the Bellman optimality equation for the afterstate value function instead of the state value function: $\begin{flalign} W^*(y) &= \sum_{r, s^\prime} p(r, s^\prime \vert y)(r + \gamma \max_a(f_2(s^\prime, a) + W^*(f_1(s^\prime, a)))) \\ &= \sum_{r, s^\prime} p(r, s^\prime \vert y)r + \gamma \sum_{s^\prime} p(s^\prime \vert y) \max_a(f_2(s^\prime, a) + W^*(f_1(s^\prime, a))) \end{flalign}$ where $p(s^\prime \vert y) = \sum_r p(r, s^\prime \vert y)$ The outer sum is just represents an expected value based on the transition out of $y$, so if we don't have access to $p(r, s^\prime \vert y)$, we could sample the transitions from the environment. The $\max_a$ term can now be calculated explicitely and will involve finding the maximum index of a vector for each transition state and does not depend on the reward. Using state values, the maximization step involves evaluating a double sum every time, so each update with afterstates is less costly. Also, the afterstates themselves might be more informative in the sense that they all have distinct values. If many of the actions from a given state, lead to the same afterstate, this method will immediately treat them all as equal, whereas with usual value iterationthat equivalence would have to be calculated with the probability transition function. The benefits of using an afterstate value function depend entirely on how effectively the environment transitions can be separated into informative deterministic steps and limited stochastic dynamics. """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$a4c4d5f2-d76d-425e-b8c9-9047fe53c4f0„§cell_idÙ$a4c4d5f2-d76d-425e-b8c9-9047fe53c4f0¤codeÙ&gridworld_Q_vs_sarsa_solve(cliffworld)¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$410abe1d-04a6-4434-9abf-0d29dd6498e6„§cell_idÙ$410abe1d-04a6-4434-9abf-0d29dd6498e6¤codeÙ*md""" ### Tabular TD(0) Implementation """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$aa0791a5-8cf1-499b-9900-4d0c59be808c„§cell_idÙ$aa0791a5-8cf1-499b-9900-4d0c59be808c¤codeÙ`function stochastic_wind(w, x, y) w == 0 && return (x, y) v = rand(wind_var) (x, y+w+v) end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$510761f6-66c7-4faf-937b-e1422ec829a6„§cell_idÙ$510761f6-66c7-4faf-937b-e1422ec829a6¤codeÚÉHTML(""" """)¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$0b9c6dbd-4eb3-4167-886e-64db9ec7ff04„§cell_idÙ$0b9c6dbd-4eb3-4167-886e-64db9ec7ff04¤codeÚÀmd""" > ### *Exercise 6.3* > From the results shown in the left graph of the random walk example it appears that the first episode results in a change only in $V(A)$. What does this tell you about what happened on the first episode? Why was only the estimate for this one state changed? By exactly how much was it changed? The update rule with TD(0) learning is given by $V(S_t) \leftarrow V(S_t) + \alpha[R_{t+1} + \gamma V(S_{t+1}) - V(S_t)]$ All states, A, B, C, D, E are initialized at 0.5 with the terminal state initialized at 0. During the first episode for all transitions before the end, the reward is 0 and the difference between adjacent states would be 0 resulting in no change to the value function. Since the value estimate for state A decreases from the initial value, this means that the first episode terminated to the left. For this final transition we have the following update. $V(A) \leftarrow V(A) + \alpha[0 + \gamma V(\text{Term}) - V(A)]$ We know that prior to the update $V(A) = 0.5$, $V(\text{Term}) = 0$ and $\gamma=1$ so the update is $V(A) \leftarrow 0.5 + \alpha[0 - 0.5]$ For this plot, $\alpha=0.1$, so the updated value for $V(A)$ is $0.5+0.1(-0.5)=0.5-0.05=0.45$ """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$a9dda9b5-f568-481c-9e8f-9bb887468775„§cell_idÙ$a9dda9b5-f568-481c-9e8f-9bb887468775¤codeÙ$md""" #### Random Walk MDP Setup """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$ad03500a-bd42-4216-a9cb-3f923152af79„§cell_idÙ$ad03500a-bd42-4216-a9cb-3f923152af79¤codeÚ×function create_car_rental_afterstate_mdp(;nmax=20, Î»s::@NamedTuple{request_A::T, request_B::T, return_A::T, return_B::T} = (request_A = 3f0, request_B = 4f0, return_A = 3f0, return_B = 2f0), movecost::T = 2f0, rentcredit::T = 10f0, movemax::Integer=5, maxovernight::Integer = 20, overnightpenalty::T = 4f0, employeeshuttle = false) where T <: Real #enumerate all states and afterstates states = [(n_a, n_b) for n_a in 0:nmax for n_b in 0:nmax] afterstates = [(n_a, n_b) for n_a in 0:nmax for n_b in 0:nmax] actions = collect(-movemax:movemax) afterstate_lookup = makelookup(afterstates) #enumerate all rewards by simply incrementing by 1 dollar from the worst to best case scenario rewards = collect(-movecost*movemax - 2*overnightpenalty:rentcredit*nmax*2) reward_lookup = Dict(zip(rewards, eachindex(rewards))) #mapping from rewards to the proper index #create a lookup for the probability of starting with n cars at the start of the day and ending up with nâ€² at the end of the day function create_probability_lookup(Î»_request, Î»_return) #can only rent from 0 to n cars. if requests exceed n, all of those situations are equivalent and the probability is 1 - p(x < n-1) p_rent = Dict(n_request => poisson(n_request, Î»_request) for n_request in 0:nmax-1) #car returns can be any number greater than or equal to 0, but all returns of nmax - (n - nrent) or more will result in the same state which is max cars p_return = Dict(n_return => poisson(n_return, Î»_return) for n_return in 0:nmax-1) #initialize probabilities for each final value at 0 prob_lookup = Dict((t, nrent) => 0f0 for t in states for nrent in 0:t[1]) for n in 0:nmax for n_rent in 0:n-1 for n_return in 0:(nmax - n + n_rent - 1) nâ€² = n - n_rent + n_return p = p_rent[n_rent]*p_return[n_return] prob_lookup[((n, nâ€²), n_rent)] += p end prob_lookup[((n, nmax), n_rent)] += p_rent[n_rent]*(1 - sum(p_return[n_return] for n_return in 0:nmax-n+n_rent-1; init = zero(T))) end for n_return in 0:(nmax - 1) nâ€² = n_return p = (1 - sum(p_rent[n_rent] for n_rent in 0:n-1; init = zero(T)))*p_return[n_return] prob_lookup[((n, nâ€²), n)] += p end prob_lookup[((n, nmax), n)] += (1 - sum(p_rent[n_rent] for n_rent in 0:n-1; init = zero(T)))*(1 - sum(p_return[n_return] for n_return in 0:nmax-1, init = zero(T))) end return prob_lookup end probabilities = (location_A = create_probability_lookup(Î»s.request_A, Î»s.return_A), location_B = create_probability_lookup(Î»s.request_B, Î»s.return_B)) #calculate probability matrix for all the afterstate transitions given starting in state s and taking action a function get_afterstate_transition(s, a) (n_a, n_b) = s #calculate the number of cars moved with sign indicating direction + being A to B, normally this is simply a but if we try to move more cars than are available, it will be capped carsmoved = if a > 0 min(a, n_a) elseif a < 0 -min(abs(a), n_b) else 0 end #cars above nmax are returned to the company but we still incur the cost of transfering them aftercount_a = min(n_a - carsmoved, nmax) aftercount_b = min(n_b + carsmoved, nmax) cost = (abs(a) - (a > 0)*employeeshuttle)*movecost + (overnightpenalty * ((aftercount_a > maxovernight) + (aftercount_b > maxovernight))) #one free transfer from A to B if employee shuttle is true in modified version, overnight penalty if too many cars are left at a lot afterstate = (aftercount_a, aftercount_b) return (afterstate, -cost) end #create functions that map a state action pair to an afterstate and intermediate reward afterstate_map = zeros(Int64, length(actions), length(states)) reward_interim_map = zeros(Float32, length(actions), length(states)) for (i_s, s) in enumerate(states) for (i_a, a) in enumerate(actions) (afterstate, r_int) = get_afterstate_transition(s, a) afterstate_map[i_a, i_s] = afterstate_lookup[afterstate] reward_interim_map[i_a, i_s] = r_int end end out = zeros(Float32, length(states), length(rewards)) #calculate probability matrix for all the sâ€², r transitions given starting in afterstate y function fillmatrix!(out, s) #initialize the matrix for sâ€², r transitions, each column runs over the transition states out .= 0f0 (aftercount_a, aftercount_b) = s for (i_sâ€², sâ€²) in enumerate(states) (n_aâ€², n_bâ€²) = sâ€² for n_rent_a in 0:aftercount_a for n_rent_b in 0:aftercount_b p_a = probabilities.location_A[((aftercount_a, n_aâ€²), n_rent_a)] p_b = probabilities.location_B[((aftercount_b, n_bâ€²), n_rent_b)] p_total = p_a*p_b r = rentcredit*(n_rent_a+n_rent_b) out[i_sâ€², reward_lookup[r]] += p_total end end end return out end #initialize probability functions with all zeros ptf = zeros(T, length(states), length(rewards), length(afterstates)) for (i_s, s) in enumerate(afterstates) ptf[:, :, i_s] .= fillmatrix!(out, s) end #find indices of the reward vector that never have non zero probability inds = reduce(intersect, [findall(0 .== [sum(ptf[:, i, j]) for i in 1:size(ptf, 2)]) for j in 1:size(ptf, 3)]) goodinds = setdiff(eachindex(rewards), inds) FiniteAfterstateMDP(states, afterstates, actions, rewards[goodinds], ptf[:, goodinds, :], afterstate_map, reward_interim_map) end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$de50f95f-984e-4387-958c-64e0265f5953„§cell_idÙ$de50f95f-984e-4387-958c-64e0265f5953¤codeÚwfunction render_walk(id; l = 5) l > 26 && error("Cannot render more than 26 states") names = Iterators.take('A':'Z', l) |> collect startstate = names[ceil(Int64, l/2)] makestate(s) = """

""" function combinestates(s1, s2) """ $s1

$s2 """ end @htl("""

$(HTML(mapreduce(makestate, combinestates, names)))

""") end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$c8500b89-644d-407f-881a-bcbd7da23502„§cell_idÙ$c8500b89-644d-407f-881a-bcbd7da23502¤codeÙÎmd""" **Figure 6.3** Interim and aymptotic performance shown for TD control methods on cliff-walking task as a function of Î±. Dashed lines represent interim performance and solid lines are asymptotic. """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$84d81413-6334-4965-8632-8a763cd3f28a„§cell_idÙ$84d81413-6334-4965-8632-8a763cd3f28a¤codeÚ€md""" Comparison of all learning methods with their double estimator counterparts and the simple MDP described in 6.7. Q-learning initially learns to take the left action much more often than the right atcion, and always takes it significantly more often than the 5% minimum probability encorced by $\epsilon$-greedy action selection with $\epsilon$=0.1. In contrast, Double Q-learning is essentially unaffected by maximization bias as is Double Expected Sarsa. Sarsa and Expected Sarsa also exhibit maximization bias as well. All of the sarsa methods eventually take the left action more than Q-learning even though the behavior policy should be the same for both. Even Double Expected Sarsa without maximization bias shows the same tendancy. The only difference between this method and Double Q-learning is the use of the $\epsilon$-greedy policy in the value calculation. So the action value estimates are for the $\epsilon$-greedy policy rather than for the greedy policy under Double Q-learning. Under this policy, sometimes the right action selection goes left and visa versa. Even under the $\epsilon$-greedy policy, the optimal policy would be to select right, but due to the variance in value estimates introduced by $\epsilon$, it will take longer for the behavior policy based on the Q values to converge to the correct values. That slower convergence is apparent in the graph above. """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$33d69db9-fa2b-40a3-bbed-21d5fd60f302„§cell_idÙ$33d69db9-fa2b-40a3-bbed-21d5fd60f302¤codeÚøfunction example_6_8(;loadfile = true) methods = [sarsa, expected_sarsa, double_expected_sarsa, q_learning, double_q_learning] names = ["Sarsa", "Expected Sarsa", "Double Expected Sarsa", "Q-learning", "Double Q-learning"] results1 = [f(noisy_gridworld, 0.1f0, 1.0f0, num_episodes = 5_000) for f in methods] displays = [show_gridworld_policy_value(noisy_gridworld, a; winds = fill(0, gridsize)) for a in results1] value_iteration_solution = begin_value_iteration_v(noisy_gridworld_dp, 1.0f0) v_true = last(first(value_iteration_solution)) value_iteration_display = show_gridworld_policy_value(noisy_gridworld, (v_true, last(value_iteration_solution))) if loadfile && isfile("example_6_8.bin") step_plot = deserialize("example_6_8.bin") else max_episodes = 20 num_samples = 10_000 steps = [(1:num_samples |> Map(_ -> f(noisy_gridworld, 0.01f0, 1.0f0, num_episodes = max_episodes)[3]) |> foldxt(+)) / num_samples for f in methods] step_traces = [scatter(x = 1:max_episodes, y = v, name = names[i]) for (i, v) in enumerate(steps)] step_plot = plot(step_traces, Layout(title = "Episode Length for Noisy Gridworld", xaxis_title = "Episodes", yaxis_title = "Steps per Episode", yaxis_type = "log")) serialize("example_6_8.bin", step_plot) end out = @htl("""

Value Iteration Solution $value_iteration_display

$(HTML(mapreduce(*, eachindex(displays)) do i """

$(names[i]) Solution $(displays[i])

""" end))

$(step_plot) """) return out end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$3f3ebc9b-b070-4d73-8be9-823b399c664c„§cell_idÙ$3f3ebc9b-b070-4d73-8be9-823b399c664c¤codeÚ#compute the value function for a policy Ï€ on an mdp with a constant step size parameter Î± and a discount rate of Î³. Must provide a tolerance Ïµ which is the maximum difference observed when updating the value function that can be tollerated to consider the value function to be converged. function batch_value_est(Ï€::Matrix{T}, mdp::MDP_TD{S, A, F, G, H}, Î±::T, Î³::T, Ïµ::T; num_episodes::Integer = 1000, vinit::T = zero(T), save_states::Vector{S} = Vector{S}(), V::Vector{T} = initialize_state_value(mdp; vinit = vinit), estimation_method::BatchMethod = TD0(), maxcount = typemax(T)) where {T<:AbstractFloat, S, A, F, G, H} check_policy(Ï€, mdp) terminds = findall(mdp.isterm(s) for s in mdp.states) V[terminds] .= zero(T) v_saves = zeros(T, length(save_states), num_episodes+1) errors = zeros(T, num_episodes) function update_saves!(v_saves, ep) for (i, s) in enumerate(save_states) i_s = mdp.statelookup[s] v_saves[i, ep] = V[i_s] end end update_saves!(v_saves, 1) #each tuple in this vector matches an output from the runepisode function saved_episodes = Vector{Tuple{Vector{S}, Vector{A}, Vector{T}}}() for n in 1:num_episodes push!(saved_episodes, runepisode(mdp, Ï€)[1:end-1]) err = typemax(T) #wait until the error has converged count = zero(T) while (count < maxcount) && (err > Ïµ) worst_error = zero(T) #update values for entire batch of episodes for ep in saved_episodes #update values for each episode in a batch and update the worst error worst_error = max(worst_error, update_value!(V, estimation_method, Î±, Î³, mdp, ep...)) end err = worst_error count += 1 end errors[n] = err #only update saves after the value function has converged for this batch update_saves!(v_saves, n+1) end return V, v_saves, errors end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$d5b612d8-82a1-4586-b721-1baaea2101cf„§cell_idÙ$d5b612d8-82a1-4586-b721-1baaea2101cf¤codeÚmd""" Value iteration with afterstates converged in 10 fewer steps than state value iteration, but the total runtime is less than 25%. So as expected the afterstate method converges in fewer steps each of which is more efficient to compute than using the state value function. """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$dee6b500-0ba1-4bbc-b217-cbb9ad47ad06„§cell_idÙ$dee6b500-0ba1-4bbc-b217-cbb9ad47ad06¤codeÙ¦example_6_5(;mdp = make_windy_gridworld(actions = [king_actions; Stay()]), num_episodes = 400, action_display = action3_display, policy_display = display_king_policy)¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$897fde24-9a4a-465e-96f2-dd9e8baab294„§cell_idÙ$897fde24-9a4a-465e-96f2-dd9e8baab294¤codeÙkshow_gridworld_policy_value(windy_gridworld, q_learning(windy_gridworld, 0.5f0, 1.0f0; num_episodes = 400))¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$1e3d231a-4065-48ce-a74e-018066fb232a„§cell_idÙ$1e3d231a-4065-48ce-a74e-018066fb232a¤codeÚŠfunction example_6_3(;l = 5, max_episodes = 100, nruns = 100, vinit = 0.5f0, Î± = 0.05f0, Ïµ = Î±, kwargs...) #note that for this task the error tolerance is set to the step size because the only reward experienced is 1, so the smallest possible maximum value update is Î± anyway mrp = make_mrp(l = l) Ï€ = make_random_policy(mrp) true_values = collect(1:l) ./ (l+1) function get_errors(method) (v, v_saves, errors) = batch_value_est(Ï€, mrp, Î±, 1.0f0, Ïµ; num_episodes = max_episodes, vinit=vinit, save_states = collect(1:l), estimation_method = method, kwargs...) sqrt.(mean((v_saves .- true_values) .^2, dims = 1)) end mc_errors = mean([get_errors(MC()) for _ in 1:nruns])[:] td0_errors = mean([get_errors(TD0()) for _ in 1:nruns])[:] t1 = scatter(x = 0:max_episodes, y = mc_errors, name = "MC") t2 = scatter(x = 0:max_episodes, y = td0_errors, name = "TD") p = plot([t1, t2], Layout(xaxis_title = "Walks / Episodes", yaxis_title = "RMS error, averaged over states", title = "Batch Training")) md""" #### Figure 6.2 $p Performance of TD(0) and constant-Î± MC under batch training on the random walk task with $l states """ end ¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$0f22e85f-ed31-49df-a7c7-0579298f05fe„§cell_idÙ$0f22e85f-ed31-49df-a7c7-0579298f05fe¤codeÚJmd""" For Monte Carlo learning each state estimate is updated with the error shown by the red arrows only after the episode is finished. For TD(0) learning, as soon as the feedback from the subsequent state is received, the error can be calculated and it is only based on the new information from one state into the future. """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$9017093c-a9c3-40ea-a9c6-881ee62fc379„§cell_idÙ$9017093c-a9c3-40ea-a9c6-881ee62fc379¤codeÚ…md""" > ### *Exercise 6.2* > This is an exercise to help develop your intuition about why TD methods are often more efficient than Monte Carlo methods. Consider the driving home example and how it is addressed by TD and Monte Carlo methods. Can you imagine a scenario in which a TD update would be better on average than a Monte Carlo update? Give an example scenario - a description of past experience and a current state - in which you would expect the TD update to be better. Here's a hint: Suppose you have lots of experience driving home from work. Then you move to a new building and a new parking lot (but you still enter the highway at the same place). Now you are starting to learn predictions for the new building. Can you see why TD updates are likely to be much better, at least initially, in this case? Might the same sort of thing happen in the original scenario? Originally, from the starting state, the expected total time to reach home is 30 minutes. Now if we change the route so that it now takes on average 5 more minutes to reach the car, but the expected elapsed time for every other leg of the journey is unchanged. Now our total time estimate should be 35 minutes from the starting state on average. Let's say we reach the car and nothing out of the ordinary is happening. The predicted time to go will be 25 minutes and the predicted total time will be 35 minutes. If nothing further out of the ordinary occurs, then only the first state will be corrected. For the Monte Carlo method, the only state with an estimate error will be the first state, but this update will not occur until after we've arrived at our destination. Either way, the next time we drive we will have a new, more accurate estimate reflecting the longer time required to reach the car. $(example_6_1(;elapsed = [0, 10, 20, 25, 32, 35], predicted_ttg = [30, 25, 15, 10, 3, 0])) In the example, during the drive several events occur during the journey that change the predicted and actual time from the average. For simplicity let's assume that when we enter our home street there is a garbage truck blocking our path. Normally it only takes 3 minutes to arrive at home, but with the truck present we estimate it will take 5 minutes (2 minutes longer). Now the total predicted time will be increased from 35 minutes to 37 minutes. In the case of Monte Carlo learning, this additional 2 minutes will propagate backwards to all of the previous states because we experienced a true travel time of 37 minutes rather than the 35 minutes predicted after the 2nd state and the 30 minutes predicted after the first state. For TD(0) learning, however, this delay will only impact the previous state after a single update. Effectively it will increase the predicted time spent on the final leg of the journey only. The prediction from the starting state will only be increased by the 5 minute increase from the walk to the car, not the delay from the garbage truck. Since we are actually starting from a new point, that feedback will be consistent and does reflect a true change in the expected time from the starting state. The garbage truck, however, may be a rare occurence. By the time this change propagates backwards through the states to the starting state, a lot more experience will be accummulated at all the other states and if Î± is some reasonable value, this delay will not be counted nearly as much as the updates from the first leg of the journey. Since TD(0) only uses feedback from one step into the future immediately, if changes are made to the environment, those changes will only affect the most closely related states immediately. In this example, all of the accurate predictions we still have about the later legs of the journey will be used to keep the predictions more stable. $(example_6_1(;elapsed = [0, 10, 20, 25, 32, 37], predicted_ttg = [30, 25, 15, 10, 5, 0])) The opposite extreme though could create a situation where the Monte Carlo updates were better. Imagine instead that you moved houses in the same neighborhood such that once you enter the home street, it takes 5 minutes to reach your home instead of 3 minutes. In this case, the Monte Carlo updates would move all of the state predictions up towards the 2 minute increase since all of the predictions would be too short. The TD(0) update though would initially only increase the prediction for the final leg of the journey and we would have to wait for this change to propagate backwards to all the other states. So the efficiency of updates for each method depends on where in the episode environmental changes occur. Actual environment change at the end of the route $(example_6_1(;elapsed = [0, 5, 15, 20, 27, 32], predicted_ttg = [30, 25, 15, 10, 3, 0])) Now there is a randomly experienced shorter leg at the start of the journey which won't affect most of the Monte Carlo updates. $(example_6_1(;elapsed = [0, 3, 13, 18, 25, 30], predicted_ttg = [30, 25, 15, 10, 3, 0])) """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$4b0d96d0-25d1-4fed-b105-c65fa2883a61„§cell_idÙ$4b0d96d0-25d1-4fed-b105-c65fa2883a61¤codeÙ%const mrp_6_2 = make_mrp(l = nstates)¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$1115f3ec-f4b2-4fba-bd5e-321a63b10a6d„§cell_idÙ$1115f3ec-f4b2-4fba-bd5e-321a63b10a6d¤codeÙ¶show_gridworld_policy_value(king_gridworld, q_learning(king_gridworld, 0.1f0, 1.0f0; num_episodes = 2000); action_display = king_action_display, policy_display = display_king_policy)¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$1e3b3234-3fe1-46c9-82b7-f729c656eb25„§cell_idÙ$1e3b3234-3fe1-46c9-82b7-f729c656eb25¤codeÚºmd""" $\begin{flalign} G_t - V_t(S_t) &= \delta_t + \gamma \eta_{t} + \gamma \left [\delta_{t+1} + \gamma \eta_{t+1} + \gamma (G_{t+2} - V_{t+2}(S_{t+2}) ) \right ] \\ &= \delta_t + \gamma \eta_{t} + \gamma \delta_{t+1} + \gamma^2 \eta_{t+1} + \gamma^2 \left [G_{t+2} - V_{t+2}(S_{t+2}) \right ] \\ &= (\delta_t + \gamma \eta_t) + \gamma (\delta_{t+1} + \gamma \eta_{t+1}) + \cdots + \gamma^{T-t-1}(\delta_{T-1} + \gamma \eta_{T-1}) + \gamma^{T-t} \left [G_T - V_T(S_T) \right ]\\ &= (\delta_t + \gamma \eta_t) + \gamma (\delta_{t+1} + \gamma \eta_{t+1}) + \cdots + \gamma^{T-t-1}(\delta_{T-1} + \gamma \eta_{T-1})\\ &=\sum_{k=t}^{T-1} \gamma^{k-t} (\delta_k + \gamma \eta_k)\\ \end{flalign}$ """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$6029990b-eb31-45ae-a869-b789fba673a6„§cell_idÙ$6029990b-eb31-45ae-a869-b789fba673a6¤codeÚmd""" To use afterstates with generalized policy iteration, we need to modify our MDP framework by considering the following trajectory: $$(S, A) \longrightarrow (Y, P) \longrightarrow (S^\prime, R) \longrightarrow \cdots \longrightarrow (S_T, R_T)$$ where $(S, A, R)$ are the usual state, action, and reward. We introduce $(Y, P)$ to indicate the afterstate and any intermediate reward that is received from the afterstate transition. The probability transition function for a normal MDP is written as $p(s^\prime, r \vert s, a)$ and represents the probability of transitioning to state $s$ with reward $r$ under the condition that an agent takes action $a$ from state $s$. When using afterstates, transitions can be represented with two functions: $p(y, \rho \vert s, a) \tag{a}$ is the probability of transitioning to afterstate $y$ with intermediate reward $\rho$ given an agent takes action $a$ from state $s$ $p(s^\prime, r \vert y) \tag{b}$ is the probability of transitioning to state $s^\prime$ with reward $r$ given an agent starts in afterstate $y$. Moreover, when an environment is modified to use afterstates, usually there are known deterministic dynamics that follow actions followed by some stochastic behavior after that. A good example is tic-tac-toe where we fully know the dynamics after making a move, but there could be some unknown behavior from the opponent. In this situation, the afterstate probability transition (a) is deterministic, so it could instead be represented by a mapping function that returns an afterstate and an intermediate reward given a state action pair. $$f_1(s, a) = y \tag{b1â€²}$$ $$f_2(s, a) = \rho \tag{b2â€²}$$ where $y$ and $\rho$ are the afterstate and reward respectively after taking action $a$ in state $s$. Now all of the stochastic dynamics of the environment are captured in (b) and the function only has 3 arguments instead of the usual 4. We can now apply all of the previous techniques to the afterstate example and even combine dynamic programming and trajectory sampling. """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$61bbf9db-49a0-4709-83f4-44f228be09c0„§cell_idÙ$61bbf9db-49a0-4709-83f4-44f228be09c0¤codeÚõfunction sarsa(mdp::MDP_TD{S, A, F, G, H}, Î±::T, Î³::T; num_episodes = 1000, qinit = zero(T), Ïµinit = one(T)/10, Qinit = initialize_state_action_value(mdp; qinit=qinit), Ï€init = create_Ïµ_greedy_policy(Qinit, Ïµinit), history_state::S = first(mdp.states), update_policy! = (v, Ïµ, s) -> make_Ïµ_greedy_policy!(v, Ïµ), save_history = false, decay_Ïµ = false) where {S, A, F, G, H, T<:AbstractFloat} terminds = findall(mdp.isterm(s) for s in mdp.states) Q = copy(Qinit) Q[:, terminds] .= zero(T) Ï€ = copy(Ï€init) vhold = zeros(T, length(mdp.actions)) #keep track of rewards and steps per episode as a proxy for training speed rewards = zeros(T, num_episodes) steps = zeros(Int64, num_episodes) if save_history action_history = Vector{A}(undef, num_episodes) end for ep in 1:num_episodes Ïµ = decay_Ïµ ? Ïµinit/ep : Ïµinit s = mdp.state_init() (i_s, i_a, a) = init_step(mdp, Ï€, s) rtot = zero(T) l = 0 while !mdp.isterm(s) (sâ€², i_sâ€², r, aâ€², i_aâ€²) = sarsa_step(mdp, Ï€, s, a) if save_history && (s == history_state) action_history[ep] = a end Q[i_a, i_s] += Î± * (r + Î³*Q[i_aâ€², i_sâ€²] - Q[i_a, i_s]) #update terms for next step vhold .= Q[:, i_s] update_policy!(vhold, Ïµ, s) Ï€[:, i_s] .= vhold s = sâ€² a = aâ€² i_s = i_sâ€² i_a = i_aâ€² l+=1 rtot += r end steps[ep] = l rewards[ep] = rtot end default_return = Q, Ï€, steps, rewards save_history && return (default_return..., action_history) return default_return end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$814d89be-cfdf-11ec-3295-49a8f302bbcf„§cell_idÙ$814d89be-cfdf-11ec-3295-49a8f302bbcf¤codeÚOmd""" # Chapter 6 Temporal-Difference Learning TD methods combine the Monte Carlo concept of learning from experience with the self-consistency ideas from dynamic programming. Unlike the pure Monte Carlo methods of Chapter 5, TD methods do not require waiting for the final outcome of an episode to start learning. In other words they bootstrap learning by exploiting what is known about the properties of the value function. Eventually we will see that different degrees of bootstrapping can be used that bridge the gap between the techniques in Chapter 5 and 6. ## 6.1 TD Prediction """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$52aebb7b-c2a9-443f-bc03-24cd25793b32„§cell_idÙ$52aebb7b-c2a9-443f-bc03-24cd25793b32¤codeÚmd""" > ### *Exercise 6.4* > The specific results shown in the right graph of the random walk example are dependent on the value of the step-size parameter $\alpha$. Do you think the conclusions about which algorithm is better would be affected if a wider range of values were used? Is there a different, fixed value of $\alpha$ at which either algorithm would have performed significantly better than shown? Why or why not? Both algorithms should theoretically converge to the true values with a sufficiently small $\alpha$ and a large enough number of samples. Over this limited window of 100 episodes, an $\alpha$ that is too small might result in convergence so slow that it does not reach error as low as a larger $\alpha$. For the MC method, $\alpha=0.01$ is the smallest value and it has the slowest convergence over this range. $\alpha=0.04$ is the largest value tested, and it results in approximately the same error after 100 episodes. The intermediate values show better performance over this number of episodes indicating that the best possible performance is already captured in this interval. For the TD method, the best results shown are for $\alpha=0.05$ which is already the smallest value with the slowest convergence rate. An even smaller value might result in a better outcome over 100 episodes, but this performance is already better than anything observed for the MC method. """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$3d8b1ccd-9bb3-42f2-a77a-6afdb72c1ff8„§cell_idÙ$3d8b1ccd-9bb3-42f2-a77a-6afdb72c1ff8¤codeÚ&#calculate the percentage error for a value update handling cases of zero values function calc_error(v_old::T, v_new::T) where T<:AbstractFloat d = v_new - v_old return abs(d) f(x) = x <= eps(one(T)) f(d) && f(v_old) && return zero(T) f(v_old) && return typemax(T) abs(d) / abs(v_old) end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$031e1106-7408-4c7e-b78e-b713c19123d1„§cell_idÙ$031e1106-7408-4c7e-b78e-b713c19123d1¤codeÚ¼begin struct UpRight <: GridworldAction end struct DownRight <: GridworldAction end struct UpLeft <: GridworldAction end struct DownLeft <: GridworldAction end const diagonal_actions = [UpRight(), UpLeft(), DownRight(), DownLeft()] const king_actions = [rook_actions; diagonal_actions] move(::UpRight, x, y) = (x+1, y+1) move(::UpLeft, x, y) = (x-1, y+1) move(::DownRight, x, y) = (x+1, y-1) move(::DownLeft, x, y) = (x-1, y-1) end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$7035c082-6e50-4df5-919f-5f09d2011b4a„§cell_idÙ$7035c082-6e50-4df5-919f-5f09d2011b4a¤codeÙXrunepisode(mdp::MDP_TD; kwargs...) = runepisode(mdp, make_random_policy(mdp); kwargs...)¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$bfe71b40-3157-47df-8494-67f8eb8e4e93„§cell_idÙ$bfe71b40-3157-47df-8494-67f8eb8e4e93¤codeÚúfunction runepisode(mdp::MDP_TD{S, A, F, G, H}, Ï€::Matrix{T}; max_steps = Inf) where {S, A, F, G, H, T<:Real} states = Vector{S}() actions = Vector{A}() rewards = Vector{T}() s = mdp.state_init() step = 1 #note that the terminal state will not be added to the state list while !mdp.isterm(s) && (step <= max_steps) push!(states, s) (i_s, i_sâ€², r, sâ€², a, i_a) = takestep(mdp, Ï€, s) push!(actions, a) push!(rewards, r) s = sâ€² step += 1 end return states, actions, rewards, s end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$b35264b0-ac5b-40ce-95e4-9b2bc4cb106f„§cell_idÙ$b35264b0-ac5b-40ce-95e4-9b2bc4cb106f¤codeÚªmd""" TD(0) update rule for action values: $Q(S_t, A_t) \leftarrow Q(S_t, A_t) + \alpha [R_{t+1} + \gamma Q(S_{t+1}, A_{t+1})-Q(S_t, A_t)]$ This update is done after every transition from a nonterminal state $S_t$. If $S_{t+1}$ is terminal, then $Q(S_{t+1}, A_{t+1})$ is defined as zero. This rule uses every element of the quintuple of events, $(S_t, A_t, R_{t+1}, S_{t+1}, A_{t+1})$, that make up a transition from one state-action pair to the next. This quintuple gives rise to the name *Sarsa* for the algorithm. Each update only uses the immediate reward and the value of the state-action pair in the subsequent state as illustrated in the backup diagram shown below. """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$d259ecca-0249-4b28-a4d7-6880d4d84495„§cell_idÙ$d259ecca-0249-4b28-a4d7-6880d4d84495¤codeÚHconst action3_display = @htl("""

Actions

""")¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$22c4ce8c-bd82-4eb3-8af5-55342018edff„§cell_idÙ$22c4ce8c-bd82-4eb3-8af5-55342018edff¤codeÙ$md""" # Dynamic Programming Code """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$6faa3015-3ac4-44af-a78c-10b175822441„§cell_idÙ$6faa3015-3ac4-44af-a78c-10b175822441¤codeÙ$const cliffworld = make_cliffworld()¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$fa04d20f-6e3f-46f8-b3f7-a543d1fa360a„§cell_idÙ$fa04d20f-6e3f-46f8-b3f7-a543d1fa360a¤codeÚfunction max_bias_visualization(;nvars_min = 2, nvars_max = 10, nmax = 10, nruns = 10_000) varlist = collect(nvars_min:nvars_max) estimates = mapreduce(+, 1:nruns) do _ data = randn(nmax, nvars_max) means = reduce(hcat, [cum_mean(c) for c in eachcol(data)]) maxes = reduce(vcat, [cum_max(r)[2:end]' for r in eachrow(means)]) end ./ nruns traces = [scatter(x = 1:nmax, y = c, name = "$(varlist[i]) variables") for (i, c) in enumerate(eachcol(estimates))] true_trace = scatter(x = 1:nmax, y = fill(0.0, nmax), name = "True Value", line_dash = "dash", mode = "lines", line_color = "black") plot([true_trace; traces], Layout(xaxis_title = "Number of Samples Per Variable", yaxis_title = "Estimate of Maximum Mean", title = "Maximization Bias for IID Variables with Zero Mean")) end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$297f1606-4ec2-4075-9f81-926dc517b76f„§cell_idÙ$297f1606-4ec2-4075-9f81-926dc517b76f¤codeÙqconst noisy_gridworld_dp = create_noisy_gridworld_mdp(noisy_gridworld, first(noisy_rewards), last(noisy_rewards))¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$f2776908-d06a-4073-b2ce-ecbf109c9cc7„§cell_idÙ$f2776908-d06a-4073-b2ce-ecbf109c9cc7¤code»md""" #### King Actions """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$d83ff60f-8973-4dc1-9358-5ad109ea5490„§cell_idÙ$d83ff60f-8973-4dc1-9358-5ad109ea5490¤codeÙÃmd""" ### Solutions on Noisy Gridworld Load Existing Results if Present: $(@bind ex_6_8_load CheckBox(default=true)) If file does not load correctly, uncheck this box to produce new results. """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$105c5c23-270d-437e-89dd-12297814c6e0„§cell_idÙ$105c5c23-270d-437e-89dd-12297814c6e0¤codeÚ™md""" > ### *Exercise 6.6* > In Example 6.2 we stated that the true values for the random walk example are 1/6 , 2/6 , 3/6 , 4/6 , and 5/6 , for states A through E. Describe at least two different ways that these could have been computed. Which would you guess we actually used? Why? ###### Method 1: Set up the following system of equations that represent the relationship between state values $\begin{flalign} V(A) &= \frac{0+V(B)}{2} \implies 2V(A)=V(B) \\ V(B) &= \frac{V(A)+V(C)}{2} \implies 2V(B) = V(A)+V(C)\\ V(C) &= \frac{V(B)+V(D)}{2} \implies 2V(C)=V(B)+V(D)\\ V(D) &= \frac{V(C)+V(E)}{2} \implies 2V(D)=V(C)+V(E)\\ V(E) &= \frac{V(D)+1}{2} \implies 2V(E)=V(D)+1\\ \end{flalign}$ We can work down from the top equation expressing everything in terms of A. For shorter expressions $V(A)$ will be written below as $A$ and likewise for other states: $\begin{flalign} B&=2A \\ 2B&=A+C \implies C = 3A \\ 2C&=B+D \implies D = 6A-2A=4A \\ 2D&=C+E \implies E = 8A-3A = 5A \\ 2E &= D + 1 \implies 10A = 4A + 1 \implies A = \frac{1}{6} \end{flalign}$ Now that we have the value for A, all the others are trivial multiplications of it from 2 to 5. ###### Method 2: Calculate each value from probability of each trajectory With this method to get $V(A)$ we would write down every possible trajectory to a terminal state with the associated probability of each. Since trajectories terminating to the left have a value of 0, we only need to add up the trajectories that terminate to the right. Below are some examples for state A. $V(A) = 0.5^5 + 4 \times 0.5^7 + \cdots$ This equation represents the single trajectory that takes 5 steps to the right each with probability one half and the 4 possible trajectories that turn around once on the way right resulting in 7 steps. This sum will end up being infintely long to account for all of the trajectories that bounce back and forth arbitrarily large amounts of time. This method is significantly harder to calculate for each state compared to the first method and is more in line with how estimates are calculated with MC sampling. The first method is more analogous to TD sampling using the bootstrapped form of the Bellman equation. """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$e8f94345-9ad5-48d4-8709-d796fb55db3f„§cell_idÙ$e8f94345-9ad5-48d4-8709-d796fb55db3f¤code¸exercise_6_5(Î± = 0.2f0)¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$64b210e8-223f-41f7-a6b7-8af6183ddf87„§cell_idÙ$64b210e8-223f-41f7-a6b7-8af6183ddf87¤codeÚAfunction make_noisy_gridworld(;actions = rook_actions, l = 3) xmax = l ymax = l make_windy_gridworld(;actions = actions, apply_wind = (w, x, y) -> (x, y), xmax = xmax, ymax = ymax, sterm = GridworldState(xmax, ymax), start = GridworldState(1, 1), winds = fill(0, xmax), get_step_reward = () -> rand(noisy_rewards)) end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$2f4e2da2-b1a1-41b1-8904-39b59f426da4„§cell_idÙ$2f4e2da2-b1a1-41b1-8904-39b59f426da4¤codeÙ†const king_gridworld_mdp_dp = create_gridworld_mdp(10, 7, GridworldState(1, 4), GridworldState(8, 4), wind_vals, king_actions, -1.0f0)¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$bc8bad61-a49a-47d6-8fa6-7dcf6c221910„§cell_idÙ$bc8bad61-a49a-47d6-8fa6-7dcf6c221910¤codeÚÕfunction example_6_1(;elapsed = [0, 5, 20, 30, 40, 43], predicted_ttg = [30, 35, 15, 10, 3, 0]) states = [:leaving, :reach_car, :exit_highway, :snd_rd, :home_st, :arrive] tt = last(elapsed) predicted_tt = predicted_ttg .+ elapsed actual_tt = fill(tt, 6) t1 = scatter(x = states, y = predicted_tt, line_color = "black", name = "actual outcome") t1â€² = scatter(x = states, y = predicted_tt, line_color = "black", name = "actual outcome", showlegend=false) t2 = scatter(x = states, y = actual_tt, mode = "lines", line = attr(dash = "dash", color = "black"), name = "Monte Carlo Prediction") errortraces = [scatter(x = [s, s], y = [e, tt], line = attr(color = "red"), marker = attr(symbol = "arrow-bar-up", angleref = "previous"), showlegend = false, name = "Mone Carlo Error") for (s, e) in zip(states, predicted_tt)] p1 = plot([t1; t2; errortraces], Layout(xaxis_title = "State", yaxis_title = "Predicted total
travel time", xaxis_ticktext = ["leaving office", "reach car", "exiting highway", "2ndary road", "home street", "arrive home"], xaxis_tickvals = states, width = 600, legend_orientation = "h", legend_y = 1.1)) td_prediction = [predicted_tt[2:end]; tt] t3 = scatter(x = states, y = td_prediction, name = "TD(0) Prediction", mode = "lines", line = attr(dash = "dash", color = "black", shape = "hv")) tderrors = [scatter(x = [states[i], states[i]], y = [predicted_tt[i], td_prediction[i]], line = attr(color = "red"), marker = attr(symbol = "arrow-bar-up", angleref = "previous"), showlegend = false, name = "TD(0) Error") for i in eachindex(states)] p2 = plot([t1â€²; t3; tderrors], Layout(xaxis_title = "State", xaxis_ticktext = ["leaving office", "reach car", "exiting highway", "2ndary road", "home street", "arrive home"], xaxis_tickvals = states, width = 600, showlegend = false)) [p1 p2] # plot(predicted_tt, xticks = (1:6, String.(states)), ylabel = "Minutes", lab = "Preicted Outcome", size = (680, 400)) # plot!(fill(43, 6), line = :dot, lab = "actual outcome") end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$2455742f-dc18-4d6b-9f58-5666adac6919„§cell_idÙ$2455742f-dc18-4d6b-9f58-5666adac6919¤codeÚfunction create_car_rental_mdp(;nmax=20, Î»s::@NamedTuple{request_A::T, request_B::T, return_A::T, return_B::T} = (request_A = 3f0, request_B = 4f0, return_A = 3f0, return_B = 2f0), movecost::T = 2f0, rentcredit::T = 10f0, movemax::Integer=5, maxovernight::Integer = 20, overnightpenalty::T = 4f0, employeeshuttle = false) where T <: Real #enumerate all states states = [(n_a, n_b) for n_a in 0:nmax for n_b in 0:nmax] actions = collect(-movemax:movemax) #enumerate all rewards by simply incrementing by 1 dollar from the worst to best case scenario rewards = collect(-movecost*movemax - 2*overnightpenalty:rentcredit*nmax*2) reward_lookup = Dict(zip(rewards, eachindex(rewards))) #mapping from rewards to the proper index #create a lookup for the probability of starting with n cars at the start of the day and ending up with nâ€² at the end of the day function create_probability_lookup(Î»_request, Î»_return) #can only rent from 0 to n cars. if requests exceed n, all of those situations are equivalent and the probability is 1 - p(x < n-1) p_rent = Dict(n_request => poisson(n_request, Î»_request) for n_request in 0:nmax-1) #car returns can be any number greater than or equal to 0, but all returns of nmax - (n - nrent) or more will result in the same state which is max cars p_return = Dict(n_return => poisson(n_return, Î»_return) for n_return in 0:nmax-1) #initialize probabilities for each final value at 0 prob_lookup = Dict((t, nrent) => 0f0 for t in states for nrent in 0:t[1]) for n in 0:nmax for n_rent in 0:n-1 for n_return in 0:(nmax - n + n_rent - 1) nâ€² = n - n_rent + n_return p = p_rent[n_rent]*p_return[n_return] prob_lookup[((n, nâ€²), n_rent)] += p end prob_lookup[((n, nmax), n_rent)] += p_rent[n_rent]*(1 - sum(p_return[n_return] for n_return in 0:nmax-n+n_rent-1; init = zero(T))) end for n_return in 0:(nmax - 1) nâ€² = n_return p = (1 - sum(p_rent[n_rent] for n_rent in 0:n-1; init = zero(T)))*p_return[n_return] prob_lookup[((n, nâ€²), n)] += p end prob_lookup[((n, nmax), n)] += (1 - sum(p_rent[n_rent] for n_rent in 0:n-1; init = zero(T)))*(1 - sum(p_return[n_return] for n_return in 0:nmax-1, init = zero(T))) end return prob_lookup end probabilities = (location_A = create_probability_lookup(Î»s.request_A, Î»s.return_A), location_B = create_probability_lookup(Î»s.request_B, Î»s.return_B)) #calculate probability matrix for all the sâ€², r transitions given starting in state s and taking action a function getmatrix(s, a) #initialize the matrix for sâ€², r transitions, each column runs over the transition states out = zeros(length(states), length(rewards)) (n_a, n_b) = s #calculate the number of cars moved with sign indicating direction + being A to B, normally this is simply a but if we try to move more cars than are available, it will be capped carsmoved = if a > 0 min(a, n_a) elseif a < 0 -min(abs(a), n_b) else 0 end #cars above nmax are returned to the company but we still incur the cost of transfering them aftercount_a = min(n_a - carsmoved, nmax) aftercount_b = min(n_b + carsmoved, nmax) cost = (abs(a) - (a > 0)*employeeshuttle)*movecost + (overnightpenalty * ((aftercount_a > maxovernight) + (aftercount_b > maxovernight))) #one free transfer from A to B if employee shuttle is true in modified version, overnight penalty if too many cars are left at a lot for (i_sâ€², sâ€²) in enumerate(states) (n_aâ€², n_bâ€²) = sâ€² for n_rent_a in 0:aftercount_a for n_rent_b in 0:aftercount_b p_a = probabilities.location_A[((aftercount_a, n_aâ€²), n_rent_a)] p_b = probabilities.location_B[((aftercount_b, n_bâ€²), n_rent_b)] p_total = p_a*p_b r = rentcredit*(n_rent_a+n_rent_b) - cost out[i_sâ€², reward_lookup[r]] += p_total end end end return out end #initialize probability function with all zeros ptf = zeros(T, length(states), length(rewards), length(actions), length(states)) for (i_s, s) in enumerate(states) for (i_a, a) in enumerate(actions) ptf[:, :, i_a, i_s] .= getmatrix(s, a) end end #find indices of the reward vector that never have non zero probability inds = reduce(intersect, [findall(0 .== [sum(ptf[:, i, j, k]) for i in 1:size(ptf, 2)]) for j in 1:size(ptf, 3) for k in 1:size(ptf, 4)]) goodinds = setdiff(eachindex(rewards), inds) FiniteMDP(states, actions, rewards[goodinds], ptf[:, goodinds, :, :]) end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$f474fcbd-e3c3-49fd-a6b7-6d6a8a7dda09„§cell_idÙ$f474fcbd-e3c3-49fd-a6b7-6d6a8a7dda09¤codeÙ%md""" ### Informal Proof for Bias """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$69eedbfd-396f-4461-b7a1-c36abc094581„§cell_idÙ$69eedbfd-396f-4461-b7a1-c36abc094581¤codeÚæfunction example_6_7_mdp(;num_actions::Integer = 10, num_episodes = 300, nruns = 10_000, Î± = 0.1f0, Ïµ = 0.1f0, load_file = true, fname = "figure_6_5.bin") load_file && isfile(fname) && begin p = deserialize(fname) return p end states = [A(), B(), Term()] actions = collect(1:num_actions) function step(::A, a) a == 1 && return (0.0f0, B()) a == 2 && return (0.0f0, Term()) return (-100f0, Term()) end step(::B, a) = (randn(Float32) - 0.1f0, Term()) state_init() = A() isterm(::Term) = true isterm(s) = false mdp = MDP_TD(states, actions, state_init, step, isterm) function get_valid_inds(i_s) i_s == 1 && return 1:2 return 1:num_actions end #in state A don't include actions other than left and right as random choices update_behavior!(v, Ïµ, ::A) = make_Ïµ_greedy_policy!(v, Ïµ; valid_inds = 1:2) update_behavior!(v, Ïµ, s) = make_Ïµ_greedy_policy!(v, Ïµ) Qinit = [[[0.0f0, 0.0f0]; fill(-100f0, num_actions-2)] zeros(Float32, num_actions) zeros(Float32, num_actions)] Ï€init = create_Ïµ_greedy_policy(Qinit, Ïµ; get_valid_inds = get_valid_inds) sarsa_results = mean(last(sarsa(mdp, 0.1f0, 1.0f0; num_episodes = num_episodes, save_history = true, Ïµinit = Ïµ, Qinit = Qinit, Ï€init = Ï€init, update_policy! = update_behavior!)) .== 1 for _ in 1:nruns) q_learning_results = mean(last(q_learning(mdp, 0.1f0, 1.0f0; num_episodes = num_episodes, save_history = true, Ïµinit = Ïµ, Qinit = Qinit, Ï€init = Ï€init, update_policy! = update_behavior!)) .== 1 for _ in 1:nruns) double_q_learning_results = mean(last(double_q_learning(mdp, 0.1f0, 1.0f0; num_episodes = num_episodes, save_history = true, Ïµinit = Ïµ, Qinit = Qinit, Ï€init_behavior = Ï€init, behavior_policy_function! = update_behavior!)) .== 1 for _ in 1:nruns) expected_sarsa_results = mean(last(expected_sarsa(mdp, 0.1f0, 1.0f0; Ïµinit = Ïµ, num_episodes = num_episodes, save_history = true, Qinit = Qinit, Ï€init = Ï€init, update_policy! = update_behavior!)) .== 1 for _ in 1:nruns) double_expected_sarsa_results = mean(last(double_expected_sarsa(mdp, 0.1f0, 1.0f0; Ïµinit = Ïµ, num_episodes = num_episodes, save_history = true, Qinit = Qinit, Ï€init_behavior = Ï€init, behavior_policy_function! = update_behavior!, target_policy_function! = update_behavior!)) .== 1 for _ in 1:nruns) optimal_trace = scatter(x = 1:num_episodes, y = fill(Ïµ / 2, num_episodes), name = "optimal", line_dash = "dash") t0 = scatter(x = 1:num_episodes, y = sarsa_results, name = "Sarsa") t1 = scatter(x = 1:num_episodes, y = q_learning_results, name = "Q-learning") t2 = scatter(x = 1:num_episodes, y = double_q_learning_results, name = "Double Q-learning") t4 = scatter(x = 1:num_episodes, y = double_expected_sarsa_results, name = "Double Expected Sarsa") t3 = scatter(x = 1:num_episodes, y = expected_sarsa_results, name = "Expected Sarsa") # plot([t0, t1, t2, t3]) traces = [t0, t1, t2, t3, t4, optimal_trace] p = plot(traces, Layout(xaxis_title = "Episodes", yaxis_title = "% left actions from A")) serialize(fname, p) return p end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$7ac99619-5232-4db8-8553-d79ea5415d29„§cell_idÙ$7ac99619-5232-4db8-8553-d79ea5415d29¤codeÚkfunction create_gridworld_mdp(mdp::MDP_TD, step_reward) #this only works when the mdp is deterministic. add a version for the stochastic wind example ptf = zeros(Float32, length(mdp.states), 2, length(mdp.actions), length(mdp.states)) for s in mdp.states i_s = mdp.statelookup[s] if mdp.isterm(s) ptf[i_s, 1, :, i_s] .= 1.0f0 else for a in mdp.actions (r, sâ€²) = mdp.step(s, a) i_a = mdp.actionlookup[a] i_sâ€² = mdp.statelookup[sâ€²] i_s = mdp.statelookup[s] ptf[i_sâ€², 2, i_a, i_s] = 1.0f0 end end end FiniteMDP(mdp.states, mdp.actions, [0.0f0, step_reward], ptf) end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$0163763b-a15f-447e-b3d2-32d4bf9d2605„§cell_idÙ$0163763b-a15f-447e-b3d2-32d4bf9d2605¤codeÙ–@bind max_visual_params2 PlutoUI.combine() do Child md""" Number of Variables: $(Child(:nvars, NumberField(2:100, default = 2))) """ end |> confirm¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$53145cc2-784c-468b-8e91-9bb7866db218„§cell_idÙ$53145cc2-784c-468b-8e91-9bb7866db218¤codeÙr@bind t PlutoUI.Clock(interval = delay, max_value = length(mrp_trajectory[1])+5, repeat=true, start_running=false)¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$6b496582-cc0e-4195-87ef-94792b0fff54„§cell_idÙ$6b496582-cc0e-4195-87ef-94792b0fff54¤codeÚ{function make_Ïµ_greedy_policy!(v::AbstractVector{T}, Ïµ::T; valid_inds = eachindex(v)) where T <: Real vmax = maximum(v[i] for i in valid_inds) v .= T.(isapprox.(v, vmax)) s = sum(v) c = s * Ïµ / length(valid_inds) d = one(T)/s - Ïµ #value to add to actions that are maximizing for i in valid_inds if v[i] == 1 v[i] = d + c else v[i] = c end end return v end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$9db7a268-1e6d-4366-a0ec-ebf54916d3b0„§cell_idÙ$9db7a268-1e6d-4366-a0ec-ebf54916d3b0¤code¸example_6_2(l = nstates)¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$c2f56287-9a3e-454a-9ec1-53184b788db9„§cell_idÙ$c2f56287-9a3e-454a-9ec1-53184b788db9¤codeÙ-const jacks_car_mdp = create_car_rental_mdp()¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$18e60b1d-97ec-432c-a388-003e7fae415f„§cell_idÙ$18e60b1d-97ec-432c-a388-003e7fae415f¤codeÚfunction bellman_optimal_value!(V::Vector{T}, mdp::FiniteAfterstateMDP{T, S1, S2, A}, Î³::T) where {T <: Real, S1, S2, A} delt = zero(T) q_vec = zeros(T, length(mdp.actions)) @inbounds @fastmath @simd for i_y in eachindex(mdp.afterstates) q_total = zero(T) r_total = zero(T) @inbounds @fastmath @simd for i_sâ€² in eachindex(mdp.states) p_total = zero(T) q_vec .= mdp.reward_interim_map[:, i_sâ€²] .+ V[mdp.afterstate_map[:, i_sâ€²]] q_max = maximum(q_vec) @inbounds @fastmath for (i_r, r) in enumerate(mdp.rewards) p = mdp.ptf[i_sâ€², i_r, i_y] r_total += p*r p_total += p end q_total += q_max*p_total end v_new = r_total + Î³*q_total delt = max(delt, abs(v_new - V[i_y]) / (eps(abs(V[i_y])) + abs(V[i_y]))) V[i_y] = v_new end return delt end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$12c5efe4-d64d-4b82-877c-29b0e537fee6„§cell_idÙ$12c5efe4-d64d-4b82-877c-29b0e537fee6¤codeÙBbegin start_mrp mrp_trajectory = runepisode(mrp_6_2, Ï€_mrp) end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$a72d07bf-e337-4bd4-af5c-44d74d163b6b„§cell_idÙ$a72d07bf-e337-4bd4-af5c-44d74d163b6b¤codeÙ'exercise_6_5(Î± = 0.2f0, vinit = 0.0f0)¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$0201ae9f-4a31-497e-86ab-62b454ca85de„§cell_idÙ$0201ae9f-4a31-497e-86ab-62b454ca85de¤codeÙÖmd""" Notice that about about $\alpha = 0.25$, Q-learning sometimes has diverging values and therefore episodes that avoid termination whereas Double Q-learning avoids that problem even at large learning rates. """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$b37f2395-1480-4c7c-b6c0-eba391e969d7„§cell_idÙ$b37f2395-1480-4c7c-b6c0-eba391e969d7¤codeÚ gmd""" Let's first consider the problem of prediction problem for afterstates and see how to compute the afterstate value function and how it could be used for policy improvement. We will use the terminology $W(y)$ to represent the value of afterstate $y$ while $V(s)$ still means the value of state $s$. From the earlier definitions, we can show the relationship between the state and afterstate value functions. Recall that: $\begin{flalign} G_t &\doteq R_t + \gamma R_{t+1} + \cdots \\ V_\pi(s) &\doteq \mathbb{E}_\pi[G_t \mid S_t = s] \\ & = \mathbb{E}_\pi[R_t + \gamma V_\pi(S_{t+1}) \mid S_t = s] \\ &= \sum_a \pi(a \vert s) \sum_{r, s^\prime} p(r, s^\prime \vert s, a) \left ( r + \gamma V(s^\prime) \right ) \end{flalign}$ Representing the trajectory with afterstates and only considering the reward following an afterstate, we also know that: $\begin{flalign} G_t &\doteq R_t + \gamma(P_{t+1} + R_{t+1} + \gamma(P_{t+2} + R_{t+1} + \cdots))\\ W_\pi(y) &\doteq \mathbb{E}_\pi[G_t \mid Y_t = y] \\ & = \mathbb{E}_\pi[R_t + \gamma \left (P_{t+1} + W_\pi(Y_{t+1}) \right ) \mid Y_t = y] \\ &= \sum_{r, s^\prime} p(r, s^\prime \vert y) \left [r + \gamma \sum_{a^\prime} \left [ \pi(a \vert s^\prime) \left ( f_2(s^\prime, a^\prime) + W_\pi(f_1(s^\prime, a^\prime) \right ) \right ] \right ] \end{flalign}$ Notice that compared to the value function, the policy only matters for this expected value when we consider the action taken from the transition state. The initial transition from the afterstate to $s^\prime$ only depends on our new transition function which only conditioned on the afterstate. Recall that to improve a policy $\pi$ for which we have a value function $V_\pi$, we must select the greedy policy with respect to $V_\pi$ meaning $\pi^{\prime} (s) = \mathrm{argmax}_a \sum_{r, s^\prime} p(r, s^\prime \vert s, a)(r + \gamma V(s^\prime))$. If we do have access to the full probability transition function, we cannot compute this explicitely. Furthermore, we cannot estimate this either from a single trajectory because from each state we would just have a single transition based on the behavior policy at the time. That's why for MDPs that do not provide the full transition function, we prefer to estimate the state action value function $Q(s, a)$ because using that function policy improvement is much more trivial: $\pi^{\prime} (s) = \mathrm{argmax}_a Q(s, a)$. """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$6edb550d-5c9f-4ea6-8746-6632806df11e„§cell_idÙ$6edb550d-5c9f-4ea6-8746-6632806df11e¤codeexample_6_1()¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$01582b3b-c4d0-4691-9edf-f77e6d8be2c9„§cell_idÙ$01582b3b-c4d0-4691-9edf-f77e6d8be2c9¤codeÙDmd""" ### Maximization Bias Visualization for a Single Estimator """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$7ed07ddc-1c63-4ce7-bfd3-6da54304d297„§cell_idÙ$7ed07ddc-1c63-4ce7-bfd3-6da54304d297¤codeÚÝfunction makepolicyvaluemaps(mdp::CompleteMDP, v::Vector{T}, Ï€::Matrix{T}) where T <: Real function getaction(dist) #default action will be 0 sum(dist) == 0 && return 0 (p, ind) = findmax(dist) mdp.actions[ind] end policymap = zeros(Int64, 21, 21) valuemap = zeros(T, 21, 21) for i in 1:size(Ï€, 2) action = getaction(view(Ï€, :, i)) (n_a, n_b) = mdp.states[i] policymap[n_a+1, n_b+1] = action valuemap[n_a+1, n_b+1] = v[i] end (policymap, valuemap) end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$4862942b-d1e2-4ac8-8e88-65205e91a070„§cell_idÙ$4862942b-d1e2-4ac8-8e88-65205e91a070¤codeÚc@bind max_visual_params PlutoUI.combine() do Child md""" ||| |---|---| |Maximum Number of Variables:|$(Child(:nvars, NumberField(2:100, default = 4)))| |Maxinum Number of Samples Per Variable:| $(Child(:nmax, NumberField(10:1000, default = 100)))| |Number of Runs:| $(Child(:nruns, NumberField(100:1_000_000, default = 10_000)))| """ end |> confirm¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$a5009785-64b4-489b-a967-f7840b4a9463„§cell_idÙ$a5009785-64b4-489b-a967-f7840b4a9463¤codeÙ-md""" #### Random Walk Visualization Code """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$eb735ead-978b-409c-8990-b5fa7a027ebf„§cell_idÙ$eb735ead-978b-409c-8990-b5fa7a027ebf¤codeÚfunction tabular_TD0_pred_V(Ï€::Matrix{T}, mdp::MDP_TD{S, A, F, G, H}, Î±::T, Î³::T; num_episodes::Integer = 1000, vinit::T = zero(T), V::Vector{T} = initialize_state_value(mdp; vinit = vinit), save_states::Vector{S} = Vector{S}()) where {T <: AbstractFloat, S, A, F, G, H} check_policy(Ï€, mdp) terminds = findall(mdp.isterm(s) for s in mdp.states) #initialize counts = zeros(Integer, length(mdp.states)) V[terminds] .= zero(T) #terminal state must always have 0 value v_saves = zeros(T, length(save_states), num_episodes+1) function updatesaves!(ep) for (i, s) in enumerate(save_states) i_s = mdp.statelookup[s] v_saves[i, ep] = V[i_s] end end updatesaves!(1) #simulate and episode and update the value function every step function runepisode!(V, j) s = mdp.state_init() while !mdp.isterm(s) (i_s, i_sâ€², r, sâ€², a, i_a) = takestep(mdp, Ï€, s) V[i_s] += Î± * (r + Î³*V[i_sâ€²] - V[i_s]) s = sâ€² end updatesaves!(j+1) return V end for i = 1:num_episodes; runepisode!(V, i); end return V, v_saves end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$2034fd1e-5171-4eda-85d5-2de62d7a1e8b„§cell_idÙ$2034fd1e-5171-4eda-85d5-2de62d7a1e8b¤codeÚ¶function q_learning(mdp::MDP_TD{S, A, F, G, H}, Î±::T, Î³::T; num_episodes = 1000, qinit = zero(T), Ïµinit = one(T)/10, Qinit = initialize_state_action_value(mdp; qinit=qinit), Ï€init = create_Ïµ_greedy_policy(Qinit, Ïµinit), decay_Ïµ = false, history_state::S = first(mdp.states), save_history = false, update_policy! = (v, Ïµ, s) -> make_Ïµ_greedy_policy!(v, Ïµ)) where {S, A, F, G, H, T<:AbstractFloat} terminds = findall(mdp.isterm(s) for s in mdp.states) Q = copy(Qinit) Q[:, terminds] .= zero(T) Ï€ = copy(Ï€init) vhold = zeros(T, length(mdp.actions)) #keep track of rewards and steps per episode as a proxy for training speed rewards = zeros(T, num_episodes) steps = zeros(Int64, num_episodes) if save_history history_actions = Vector{A}(undef, num_episodes) end for ep in 1:num_episodes Ïµ = decay_Ïµ ? Ïµinit/ep : Ïµinit s = mdp.state_init() rtot = zero(T) l = 0 while !mdp.isterm(s) (i_s, i_sâ€², r, sâ€², a, i_a) = takestep(mdp, Ï€, s) if save_history && (s == history_state) history_actions[ep] = a end qmax = maximum(Q[i, i_sâ€²] for i in eachindex(mdp.actions)) Q[i_a, i_s] += Î±*(r + Î³*qmax - Q[i_a, i_s]) #update terms for next step vhold .= Q[:, i_s] update_policy!(vhold, Ïµ, s) Ï€[:, i_s] .= vhold s = sâ€² l+=1 rtot += r end steps[ep] = l rewards[ep] = rtot end save_history && return Q, Ï€, steps, rewards, history_actions return Q, Ï€, steps, rewards end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$4382928c-6325-4ecd-b7cf-282525a270ab„§cell_idÙ$4382928c-6325-4ecd-b7cf-282525a270ab¤codeÙŠbegin abstract type MaxBiasStates end struct A <: MaxBiasStates end struct B <: MaxBiasStates end struct Term <: MaxBiasStates end end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$8bc54c94-9c92-4904-b3a6-13ff3f0110bb„§cell_idÙ$8bc54c94-9c92-4904-b3a6-13ff3f0110bb¤codeÚîfunction show_grid_value(mdp, Q::Matrix, wind::Vector, name; action_display = king_action_display, scale = 1.0) width = maximum(s.x for s in mdp.states) height = maximum(s.y for s in mdp.states) start = mdp.state_init() termind = findfirst(mdp.isterm, mdp.states) sterm = mdp.states[termind] ngrid = width*height @htl("""

$(HTML(mapreduce(i -> """

$(round(maximum(Q[:, i]), sigdigits = 2))

""", *, eachindex(mdp.states))))

$(HTML(mapreduce(i -> """

$(wind[i])

""", *, 1:width)))

$(action_display)

Wind Values

""") end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$4b1a4c14-3c2b-40c0-995c-cd0334ed8b3a„§cell_idÙ$4b1a4c14-3c2b-40c0-995c-cd0334ed8b3a¤code½md""" #### Normal Actions """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$f0f9d3d5-e76a-4472-bfb1-da29d73a7916„§cell_idÙ$f0f9d3d5-e76a-4472-bfb1-da29d73a7916¤codeÙ‚example_6_5(;mdp = king_gridworld, num_episodes = 400, action_display = king_action_display, policy_display = display_king_policy)¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$4c1b286c-2ba9-4293-81e1-bf360baa75fa„§cell_idÙ$4c1b286c-2ba9-4293-81e1-bf360baa75fa¤codeÚmd""" The following argument is taken from ["Double Q-learning"](https://papers.nips.cc/paper_files/paper/2010/file/091d584fced301b442654dd8c23b3fc9-Paper.pdf) by Hado van Hasselt published in _Advances in Neural Information Processing Systems 23 (NIPS 2010)_: Consider a set of $M$ random variables $X=\{X_1, \dots, X_M\}$. We would like to calculate: $$\max_i \mathbb{E} \{X_i\} \tag{a}$$ Without any knowledge of the underlying distribution of each $X_i$ it is impossible to determine $(\star)$ exactly. Most often we would approximate it by first constructing approximations for $\mathbb{E} \{ X_i \} \: \forall \: i$. Let $S = \bigcup_{i=1}^M S_i$ denote the set of samples where $S_i$ is the subset containing samples for the variable $X_i$. We assume that the samples in $S_i$ are independent and identically distributed (iid). Unbiased estimates for the expected values can be obtained by computing hte sample average for each variable: $\mathbb{E} \{ X_i \} = \mathbb{E} \{ \mu_i \} \approx \mu_i(S) \doteq \frac{1}{\vert S_i \vert } \sum_{s \in S_i} s$ where $\mu_i$ is an estimator for the variable $X_i$. This approximation is unbiased since very sample $s in S_i$ is an unbiased estimat for the value of $\mathbb{E} \{ X_i \}$. The error in approximation thus consists soley of the variance in the estimator and decreases when we obtain more samples. We use the following notations: $f_i$ denotes the probability density function (PDF) of the $i^{th}$ variable $X_i$ and $F_i(x) = \int_{-\infty}^{x} f_i(x)dx$ is the cumulative distribution function (CDF) of this PDF. Similarly, the PDF and CDF of the $i^{th}$ estimator are denoted $f_i^\mu$ and $F_i^\mu$. The maximum expected value cna be expressed in terms of the underlying PDFs as $\max_i \mathbb{E} \{ X_i \} = \max_i \int_{-\infty}^\infty x f_i(x)dx$. An obvious way to approximate the value of $(a)$ is to use the value of the maximal estimator: $$\max_i \mathbb{E} \{ X_i \} = \max_i \mathbb{E} \{ \mu_i \} \approx \max_i \mu_i(S) \tag{b}$$ and this is the estimator employed in ordinary Q-learning. This estimator is distributed according to some PDF $f_{max}^\mu$ that is dependent on the PDFs of the estimators $f_i^\mu$. To determine this PDF, consider the CDF $F_{\max}^\mu(x)$, which gives the probability that the maximum estimate is lower or equal to $x$. This probability is equal to the probability that all the estimates are lower or equal to $x: F_{\max}^\mu(x) \doteq P(\max_i \mu_i \leq x) = \prod_{i=1}^M P(\mu_i\leq x) \doteq \prod_{i=1}^M F_i ^\mu (x)$. The value $\max_i \mu_i(S)$ is an unbiased estimate for $\mathbb{E} \{ \max_j \mu_j \} = \int_{-\infty}^{\infty} x f_{\max}^\mu(x)dx$ which can thus be given by: $$\mathbb{E} \{ \max_j \mu_j \} = \int_{-\infty}^{\infty} x \frac{d}{dx} \prod_{i=1}^M F_i ^ \mu (x) dx = \sum_{j=1}^M \int_{-\infty}^{\infty}x f_j ^ \mu (x) \prod_{i \neq j}^M F_i ^ \mu(x) dx \tag{c}$$ However in $(a)$ the order of the max operator and the expectation operator are the other way around. The following illustrates why $(c)$ has a positive bias. """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$3134e913-1e86-495d-a558-c3ec4828bf7b„§cell_idÙ$3134e913-1e86-495d-a558-c3ec4828bf7b¤codeÙºbegin_value_iteration_v(mdp::FiniteMDP{T,S,A}, Î³::T; Vinit::T = zero(T), kwargs...) where {T<:Real,S,A} = begin_value_iteration_v(mdp, Î³, Vinit .* ones(T, size(mdp.ptf, 1)); kwargs...)¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$db31579e-3e56-4271-8fc3-eb13bc95ac27„§cell_idÙ$db31579e-3e56-4271-8fc3-eb13bc95ac27¤codeÙ[md""" Adding the no-movement action doesn't seem to change the shortest path of 7 steps """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$943b6d7e-14a4-4532-90c7-dd5080be0c6e„§cell_idÙ$943b6d7e-14a4-4532-90c7-dd5080be0c6e¤codeÙ%const noisy_rewards = [-1.2f0, 1.0f0]¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$84584793-8274-4aa1-854f-b167c7434548„§cell_idÙ$84584793-8274-4aa1-854f-b167c7434548¤codeÚ Ûfunction gridworld_Q_vs_sarsa_vs_expected_sarsa_solve(mdp; Î±=0.5f0, Ïµ=0.1f0, num_episodes = 500, nruns = 100) function addtuple(t1, t2) Tuple(t1[i] .+ t2[i] for i in eachindex(t1)) end sarsa_results = mapreduce(addtuple, 1:nruns) do _ sarsa(mdp, Î±, 1.0f0; num_episodes = num_episodes, Ïµinit = Ïµ) end qlearning_results = mapreduce(addtuple, 1:nruns) do _ q_learning(mdp, Î±, 1.0f0; num_episodes = num_episodes, Ïµinit = Ïµ) end expected_sarsa_results = mapreduce(addtuple, 1:nruns) do _ expected_sarsa(mdp, Î±, 1.0f0; num_episodes = num_episodes, Ïµinit = Ïµ) end # double_expected_sarsa_results = mapreduce(addtuple, 1:nruns) do _ # double_q_learning(mdp, Î±, 1.0f0; num_episodes = num_episodes, Ïµinit = Ïµ) # end # qlearning_results = [q_learning(mdp, Î±, 1.0f0; num_episodes = num_episodes, Ïµinit = Ïµ) for _ in 1:nruns] p1 = plot_path(mdp, create_greedy_policy(sarsa_results[1] ./ nruns); windtext = fill("", 12), xtitle = "", title = "Cliff Walking Sarsa Path") p2 = plot_path(mdp, qlearning_results[2] ./ nruns; windtext = fill("", 12), xtitle = "", title = "Cliff Walking Q Learning Path") expected_sarsa_path = plot_path(mdp, create_greedy_policy(expected_sarsa_results[1] ./ nruns); windtext = fill("", 12), xtitle = "", title = "Cliff Walking Expected Sarsa Path") # double_expected_sarsa_path = plot_path(mdp, create_greedy_policy(double_expected_sarsa_results[1] ./ nruns); windtext = fill("", 12), xtitle = "", title = "Cliff Walking Double Expected Sarsa Path") traces = [scatter(x = 1:num_episodes, y = results[4] ./ nruns, name = name) for (results, name) in zip([sarsa_results, qlearning_results, expected_sarsa_results], ["Sarsa", "Q-learning", "Expected Sarsa"])] p3 = plot(traces, Layout(xaxis_title = "Episodes", yaxis = attr(title = "Sum of rewards during episode", range = [-100, -15]))) p3 = plot(traces, Layout(xaxis_title = "Episodes", yaxis = attr(title = "Sum of rewards during episode", range = [-100, -15]))) steptraces = [scatter(x = 1:num_episodes, y = results[3] ./ nruns, name = name) for (results, name) in zip([sarsa_results, qlearning_results, expected_sarsa_results], ["Sarsa", "Q-learning", "Expected Sarsa"])] p4 = plot(steptraces, Layout(xaxis_title = "Episodes", yaxis = attr(title = "Average steps per episode
during training", range = [0, 100]))) @htl("""

$p1

$p2

$expected_sarsa_path

$p3 $p4 """ ) end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$9f28772c-9afe-4253-ab3b-055b0f48be6e„§cell_idÙ$9f28772c-9afe-4253-ab3b-055b0f48be6e¤codeÚfunction plot_path(mdp, Ï€; title = "Optimal policy
path example", windtext = [0, 0, 0, 1, 1, 1, 2, 2, 1, 0], xtitle = "Wind Values") eg = runepisode(mdp, Ï€; max_steps = 100) xmax = maximum([s.x for s in mdp.states]) ymax = maximum([s.y for s in mdp.states]) start = mdp.state_init() goal = mdp.states[findfirst(mdp.isterm(s) for s in mdp.states)] start_trace = scatter(x = [start.x + 0.5], y = [start.y + 0.5], mode = "text", text = ["S"], textposition = "left", showlegend=false) finish_trace = scatter(x = [goal.x + .5], y = [goal.y + .5], mode = "text", text = ["G"], textposition = "left", showlegend=false) path_traces = [scatter(x = [eg[1][i].x + 0.5, eg[1][i+1].x + 0.5], y = [eg[1][i].y + 0.5, eg[1][i+1].y + 0.5], line_color = "blue", mode = "lines", showlegend=false, name = "Optimal Path") for i in 1:length(eg[1])-1] finalpath = scatter(x = [eg[1][end].x + 0.5, last(eg).x + .5], y = [eg[1][end].y + 0.5, last(eg).y + 0.5], line_color = "blue", mode = "lines", showlegend=false, name = "Optimal Path") h1 = 30*ymax plot([start_trace; finish_trace; path_traces; finalpath], Layout(xaxis = attr(showgrid = true, showline = true, gridwith = 1, gridcolor = "black", zeroline = true, linecolor = "black", mirror=true, tickvals = 1:xmax, ticktext = windtext, range = [1, xmax+1], title = xtitle), yaxis = attr(linecolor="black", mirror = true, gridcolor = "black", showgrid = true, gridwidth = 1, showline = true, tickvals = 1:ymax, ticktext = fill("", ymax), range = [1, ymax+1]), width = max(30*xmax, 200), height = max(h1, 200), autosize = false, padding=0, paper_bgcolor = "rgba(0, 0, 0, 0)", title = attr(text = title, font_size = 14, x = 0.5))) end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$1dd1ba55-548a-41f6-903e-70742fd60e3d„§cell_idÙ$1dd1ba55-548a-41f6-903e-70742fd60e3d¤codeÙ>show_mrp_state("eg1", mrp_trajectory[1], mrp_trajectory[3], t)¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$2a3e4617-efbb-4bbc-9c61-8535628e439c„§cell_idÙ$2a3e4617-efbb-4bbc-9c61-8535628e439c¤codeÚ™md""" > ### *Exercise 6.12* > Supposed action selection is greedy. Is Q-learning then exactly the same algorithm as Sarsa? Will they make exactly the same action selections and weight updates? Consider both updates when the greedy policy is followed during training. Sarsa Update: $Q_\pi(S_t, A_t) = \text{E}_\pi [R_{t+1} + \gamma Q_\pi(S_{t+1}, A_{t+1})]$ with $A_{t+1}$ chosen by the greedy policy accoring to $\text{max}_a Q_\pi(S_{t+1})$ for the estimates prior to this update. Q-Learning Update: $Q_\pi(S_t, A_t) = \text{E}_\pi [R_{t+1} + \gamma \text{max}_a Q_\pi(S_{t+1}, a)]$ The value updates are identical since the Q estimate used in both cases will be based on the maximizing action at state $S_{t+1}$. In the case of Sarsa, $A_{t+1}$ has already been selected prior to this update occurring, so this value update will properly reflect the next step in the trajectory. In Q-learning, the action selection at $S_{t+1}$ will occur after the update step. Notice that we only updated $Q_\pi(S_t, A_t)$ and did not touch $Q_\pi(S_{t+1}, A_{t+1})$, so our next action selection should be unaffected by this update. However, there in one exception for the case where the state is identical through the transition: $S_t = S_{t+1}$. In this case, the update could actually affect the next action selection, for example, let's say a very low reward was received during the update. That would lower the estimate for this action selected on step t and it may no longer be maximizing on step t+1. Then Sarsa would have chosen the same action ahead of the update but Q-learning would chose a different action on the next step even though the state is unchanged. Despite this difference, both methods are still computing the state-action value function for the optimal policy, but neither is guaranteed to converge to this function due to the violation of the assumption that all state-action pairs are visited during training. """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$5f32fed0-c921-4cbb-85fe-ade54d4c6c95„§cell_idÙ$5f32fed0-c921-4cbb-85fe-ade54d4c6c95¤codeÚImd""" At each state or checkpoint you try to predict how much longer it will take to get home using any information that is relevant. Notice that regardless of how inaccurate we were on previous steps, we can still make an accurate prediction for the time to go. |State|Elapsed Time (minutes)|Predicted Time to Go|Predicted Total Time| |---|---|---|---| |leaving office, friday at 6|0|30|30| |reach car, raining|5|35|40| |exiting highway|20|15|35| |2ndary road, behind truck|30|10|40| |entering home street|40|3|43| |arriving home|43|0|43| The rewards in this example are the elapsed times on each leg of the journey and there is no discounting, thus the return for each state is the actual time to go from that state. The value of each state is the *expected* time to go. The second column of numbers gives the current estimated value for the state encountered. A simple way to view the operation of Mone Carlo methods is to plot hte predicted total time (the last column) over the sequence. For each state we would compare that value with the actual elapsed time which was 43 minutes. """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$a3d10753-2ec3-4252-9629-834145678b6a„§cell_idÙ$a3d10753-2ec3-4252-9629-834145678b6a¤codeÙ'md""" ### Afterstate Implementation """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$12aac612-758b-4655-8ede-daddd4af6d3e„§cell_idÙ$12aac612-758b-4655-8ede-daddd4af6d3e¤codeÚ¡#take a step in the environment from state s using policy Ï€ and generate the subsequent action selection as well function sarsa_step(mdp::MDP_TD{S, A, F, G, H}, Ï€::Matrix{T}, s::S, a::A) where {S, A, F<:Function, G<:Function, H<:Function, T<:Real} (r, sâ€²) = mdp.step(s, a) i_sâ€² = mdp.statelookup[sâ€²] i_aâ€² = sample_action(Ï€, i_sâ€²) aâ€² = mdp.actions[i_aâ€²] return (sâ€², i_sâ€², r, aâ€², i_aâ€²) end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$2c49900b-3c57-4d9a-b3dc-ef9cc20c30c1„§cell_idÙ$2c49900b-3c57-4d9a-b3dc-ef9cc20c30c1¤codeÚ@md""" To understand the origin of the bias, consider a case where we only have a single sample from each variable which follows a standard normal distribution. In this case our estimate of the maximum expected value is just $\max(x, y)$ where $x$ and $y$ are samples from $X$ and $Y$ respectively. The expected value of this estimator can be calculated using the distribution of the maximum of two standard normal random variables: $\mathbb{E}\left [ \text{max}(\mathcal{N}(0, 1), \mathcal{N}(0, 1)) \right ] = \frac{1}{\sqrt{\pi}} \approx 0.564$ Indeed, on the plot for 2 variables after 1 sample collected for each, this average observed value is 0.56 and the value increase the more variables in our list. So apparantly our estimate has a positive bias despite the fact that every underlying variables have exactly the same distribution. If we had more samples for each variable then we would use the distribution of the sample average rather than a single sample and that distribution has a variance proportional to the inverse of the number of samples. So the bias will converge to zero in the limit of infinite samples, and in the graph the bias does in fact converge to zero over more samples. There is a method of eliminating this positive bias using a so-called *double estimator*, and this method was first introduced by Hado van Hasselt in a paper published during NIPS 2010. Below is a more thorough overview of the paper, but first I will provide a conceptual sketch of the proof. First consider a set of $M$ random variables $X = \{X_1, \dots, X_M \}$ and our goal is to estimate: $\max_i \mathbb{E} \{ X_i \}$. In the single estimator case, we will draw samples from each variable and construct some unbiased estimator for each mean: $\mu_i$. After we have collected some set of samples, using this method, we make the assumption that which ever estimator or set of estimators have the maximum value are the true variables with the maximum expected value. If there is zero overlap in the distribution of each random variable, then these estimators will always be ranked in the same order as the true expected values and our estimate will be unbiased. However, if there is any overlap in the underlying distributions (this also includes the case where all distributions are identical), then there is some non-zero probability that the true maximum index is NOT in the set of indices for the maximum estimators. Let's say the apparent maximizing index from the sample is $s^*$ while one of the true maximizing indices is $j \neq s^*$. So our final estimate for the maximum expected value will be $\mu_{s^*}$. We already know that $\mathbb{E} \{ X_j \} = \max_i \mathbb{E} \{X_i \}$ by assumption. We also know that $\mu_{s^*} > \mu_j$ in the sample and $\mathbb{E} \{ \mu_j\} = \max_i \mathbb{E} \{X_i \}$ which is the true value that we want. So we would always expect this estimator to be larger than the true answer or equal to it in the case where the selected index is correct. This is even true if all the variables share the same distribution, because every estimate has the same expected value which is the true answer, yet the one estimate we use to calculate the maximum is guaranteed to be larger than all of those unbiased alternatives. The underlying reason why this will tend to overestimate is because in any finite sample, we are not guaranteed to know the correct maximizing index and any variable that produces samples high enough to exceed the true maximum will always be selected to represent that maximum. In the double estimator case, we split the samples into two sets $\mathcal{A}$ and $\mathcal{B}$ such that $\mathcal{A} \bigcap \mathcal{B} = \emptyset$ and have a set of estimators for each set $\mu_i^\mathcal{A}$ and $\mu_i^\mathcal{B}$. Let $a^*$ be in the set of indices with the maximum estimated values in set $\mathcal{A}$. Again, if the underlying distributions overlap at all, then there is some probability that this index is not in the set of true maximizing indices. However, now if all the distributions are equal, then whichever index we pick is still guaranteed to be correct. To estimate the actual value of the maximum, we take $\mu_{i_{a*}}^\mathcal{B}$ which is the estimate from set $\mathcal{B}$ at the maximizing index from set $\mathcal{A}$. Just like in the single estimator case, if this happens to be a correct index, then we have an unbiased estimate for the true value. However, if the index is wrong, we are estimating the expected value of a non-maximizing index from a new set of samples. By the definition of the maximizing indices, we know that in this case $\mathbb{E} \{ \mu_{a^*}^\mathcal{B} \} \lt \max_i \mathbb{E} \{ X_i \}$ resulting in a negative bias for our estimate. Just like in the single estimator case, this estimate will be unbiased if there is no overlap in the underlying probability distributions for each variable. Unlike the single estimator case, this estimate will also be unbiased if all the underlying distributions are equal. See below for a visualization of the bias removal for the iid case as well as the more formal proof for both methods. """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$e26f788e-f602-403e-929e-6c98a6e6bf79„§cell_idÙ$e26f788e-f602-403e-929e-6c98a6e6bf79¤codeÚ¶md""" The double estimator methods are the only ones that don't show an initial increase in the number of episodes. After enough time though, every methodstarts to converge to the policy that takes a direct path. If $\alpha$ is not low enough, Q-learning fails to converge towards the optimal policy and has diverging value estimates. Both double methods are very stable and correctly estimate every state to have a negative value. """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$c09530bc-f37e-4d57-a267-14d4027147da„§cell_idÙ$c09530bc-f37e-4d57-a267-14d4027147da¤codeÚmd""" Returning to the definition of $\eta_t$, we can simplify further: $\eta_{t} \doteq V_{t+1}(S_{t+1}) - V_t(S_{t+1})$ This quantity is the change in value estimate at a state between two time steps. Note that at time $t+1$ we have only performed an update for the value at state $S_t$ using the equation: $V_{t+1}(S_t) = V_t(S_t) + \alpha \delta_t$ If $S_{t+1} \neq S_t$, then the value estimate at this state will not occur on either time step $t$ or $t+1$, so $V_{t+1}(S_{t+1}) = V_t(S_{t+1}) \implies \eta_{t} = 0$ The only case in which $V_{t+1}(S_{t+1}) \neq V_t(S_{t+1})$ is when $S_t = S_{t+1} = S$. In this case, $V_{t+1}(S) = V_t(S) + \alpha \delta_t \implies V_{t+1}(S) - V_t(S) = \alpha \delta_t$ So we can rewrite $\eta_{t} = \alpha \delta_t \mathbb{1}_{t}$ where $\mathbb{1}_{t} = \begin{cases} 1 & \text{if } S_{t+1} = S_t \\ 0 & \text{otherwise} \end{cases}$ So the original equation can be written as: $\begin{flalign} G_t - V_t(S_t) &= \sum_{k=t}^{T-1} \gamma^{k-t} (\delta_k + \gamma \alpha \delta_k \mathbb{1}_k) \\ &= \sum_{k=t}^{T-1} \gamma^{k-t} \delta_k (1 + \gamma \alpha \mathbb{1}_k) \\ \end{flalign}$ Where the first term is the value from the original derivation and the second term is only non-zero when a state appears twice concecutively in an episode. """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$0c0b875e-69f8-46ed-ad06-df9c36088fbe„§cell_idÙ$0c0b875e-69f8-46ed-ad06-df9c36088fbe¤code²const gridsize = 3¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$8d05403a-adeb-40ac-a98a-87586d5a5170„§cell_idÙ$8d05403a-adeb-40ac-a98a-87586d5a5170¤codeÙ*md""" ### Example 6.5: Windy Gridworld """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$44c49006-e210-4f97-916e-fe62f36c593f„§cell_idÙ$44c49006-e210-4f97-916e-fe62f36c593f¤codeÚCmd""" ## 6.5 Q-learning: Off-policy TD Control One of the early breakthroughs in reinforcement learning was the development of an off-policy TD control algorithm known as *Q-learning* (Watkins, 1989), defined by $Q(S_t, A_t) \leftarrow Q(S_t, A_t) + \alpha [R_{t+1} + \gamma \text{max}_a Q(S_{t+1}, a) - Q(S_t, A_t)]$ """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$0ad739c9-8aca-4b82-bf20-c73584d29535„§cell_idÙ$0ad739c9-8aca-4b82-bf20-c73584d29535¤codeÚjmd""" > ### *Exercise 6.9 Windy Gridworld with King's Moves (programming)* > Re-solve the windy gridworld assuming eight possible actions, including the diagonal moves, rather than four. How much better can you do with the extra actions? Can you do even better by including a ninth action that causes no movement at all other than that caused by the wind? """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$0748902c-ffc0-4634-9a1b-e642b3dfb77b„§cell_idÙ$0748902c-ffc0-4634-9a1b-e642b3dfb77b¤codeÚR#forms a random policy for a generic finite state mdp. The policy is a matrix where the rows represent actions and the columns represent states. Each column is a probability distribution of actions over that state. form_random_policy(mdp::CompleteMDP{T}) where T = ones(T, length(mdp.actions), length(mdp.states)) ./ length(mdp.actions)¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$6a1503c6-c77b-4e3a-9f07-74b2af1a5ff7„§cell_idÙ$6a1503c6-c77b-4e3a-9f07-74b2af1a5ff7¤codeÙ"md""" ### Sarsa Implementation """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$292d9018-b550-4278-a8e0-78dd6a6853f1„§cell_idÙ$292d9018-b550-4278-a8e0-78dd6a6853f1¤codeÚÚfunction expected_sarsa(mdp::MDP_TD{S, A, F, G, H}, Î±::T, Î³::T; num_episodes = 1000, qinit = zero(T), Ïµinit = one(T)/10, Qinit = initialize_state_action_value(mdp; qinit=qinit), Ï€init = create_Ïµ_greedy_policy(Qinit, Ïµinit), update_policy! = (v, Ïµ, s) -> make_Ïµ_greedy_policy!(v, Ïµ), decay_Ïµ = false, save_history = false, save_state = first(mdp.states)) where {S, A, F, G, H, T<:AbstractFloat} terminds = findall(mdp.isterm(s) for s in mdp.states) Q = copy(Qinit) Q[:, terminds] .= zero(T) Ï€ = copy(Ï€init) vhold = zeros(T, length(mdp.actions)) #keep track of rewards and steps per episode as a proxy for training speed rewards = zeros(T, num_episodes) steps = zeros(Int64, num_episodes) if save_history action_history = Vector{A}(undef, num_episodes) end for ep in 1:num_episodes Ïµ = decay_Ïµ ? Ïµinit/ep : Ïµinit s = mdp.state_init() rtot = zero(T) l = 0 while !mdp.isterm(s) (i_s, i_sâ€², r, sâ€², a, i_a) = takestep(mdp, Ï€, s) if save_history && (s == save_state) action_history[ep] = a end q_expected = sum(Ï€[i, i_sâ€²]*Q[i, i_sâ€²] for i in eachindex(mdp.actions)) Q[i_a, i_s] += Î±*(r + Î³*q_expected - Q[i_a, i_s]) #update terms for next step vhold .= Q[:, i_s] update_policy!(vhold, Ïµ, s) Ï€[:, i_s] .= vhold s = sâ€² l+=1 rtot += r end steps[ep] = l rewards[ep] = rtot end base_return = (Q, Ï€, steps, rewards) save_history && return (base_return..., action_history) return base_return end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$07c57f37-22be-4c39-8279-d80addcea0c5„§cell_idÙ$07c57f37-22be-4c39-8279-d80addcea0c5¤codeÚºfunction create_stochastic_gridworld_mdp(width, height, start, goal, wind, actions, step_reward) mdp = make_windy_gridworld(;actions = actions, apply_wind = apply_wind, sterm = goal, start = start, xmax = width, ymax = height, winds = wind_vals, get_step_reward = () -> step_reward) ptf = zeros(Float32, length(mdp.states), 2, length(mdp.actions), length(mdp.states)) for s in mdp.states i_s = mdp.statelookup[s] if mdp.isterm(s) ptf[i_s, 1, :, i_s] .= 1.0f0 else for a in mdp.actions w = wind[s.x] (r, sâ€²) = mdp.step(s, a) i_a = mdp.actionlookup[a] i_s = mdp.statelookup[s] i_sâ€² = mdp.statelookup[sâ€²] if w == 0 ptf[i_sâ€², 2, i_a, i_s] = 1.0f0 else #with stochastic wind split the probabilities between the possible outcomes ptf[i_sâ€², 2, i_a, i_s] += Float32(1/3) sâ€²2 = GridworldState(sâ€².x, min(height, sâ€².y + 1)) i_sâ€²2 = mdp.statelookup[sâ€²2] ptf[i_sâ€²2, 2, i_a, i_s] += Float32(1/3) sâ€²3 = GridworldState(sâ€².x, max(1, sâ€².y - 1)) i_sâ€²3 = mdp.statelookup[sâ€²3] ptf[i_sâ€²3, 2, i_a, i_s] += Float32(1/3) end end end end FiniteMDP(mdp.states, mdp.actions, [0.0f0, step_reward], ptf) end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$b5187232-d808-49b6-9f7e-a4cbeb6c2b3e„§cell_idÙ$b5187232-d808-49b6-9f7e-a4cbeb6c2b3e¤codeÙ'md""" ### Example 6.1: Driving Home """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$54d97122-2d01-46ec-aafe-00bfc9f2d6d1„§cell_idÙ$54d97122-2d01-46ec-aafe-00bfc9f2d6d1¤codeÙ[md""" Step: $(min(length(first(mrp_trajectory)), t)) / $(length(first(mrp_trajectory))) """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$926ec37d-b969-4dc9-99b2-a6b29c6d880c„§cell_idÙ$926ec37d-b969-4dc9-99b2-a6b29c6d880c¤codeºmd""" #### Figure 6.5: """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$c360945e-f8b2-4c6f-a70c-6ab4ddcf5b54„§cell_idÙ$c360945e-f8b2-4c6f-a70c-6ab4ddcf5b54¤codeÙ¾md""" By changing the initialization to 0, the RMS error monotonically converges to the minimum since the state values never pass through the correct values on their way to overshooting. """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$573a9919-bd7e-4a56-b830-4e40e91288ef„§cell_idÙ$573a9919-bd7e-4a56-b830-4e40e91288ef¤codeÚ7md""" Let $X = \{ X_1, \dots, X_M \}$ be a set of random variables and let $\mu^A = \{\mu_1^A, \dots, \mu_M^A \}$ and $\mu^B = \{\mu_1^B, \dots, \mu_M^B\}$ be two sets of unbiased estimators such that $\mathbb{E} \{ \mu_i^A \} = \mathbb{E} \{ \mu_i^B \} = \mathbb{E} \{ X_i \}$ for all $i$. Let $$\mathcal{M} \doteq \left \{ j \mid \mathbb{E} \{ X_j \} = \max_i \mathbb{E} \{ X_i \} \right \}$$ be the set of labels of estimators that maximize the expcted values of $X$. Let $a^*$ be an element that maximizes $\mu^A:\mu_{a^*}^A = \max_i \mu_i^A$. The claim is that: $$\mathbb{E} \{ \mu_{a^*}^B \} = \mathbb{E} \{ X_{a^*} \} \leq \max_i \mathbb{E} \{ X_i \}$$. Furthermore, the inequality is strict if and only if $P(a^* \notin \mathcal{M}) \gt 0$. *Proof*. Assume $a^* \in \mathcal{M}$. Then $\mathbb{E} \{ \mu_{a^*}^B\} = \mathbb{E} \{ X_{a^*}\} \doteq \max_i \mathbb{E} \{ X_i \}$. Now assume $a^* \notin \mathcal{M}$ and choose $j \in \mathcal{M}$. Then $\mathbb{E} \{ \mu_{a^*} \} = \mathbb{E} \{ X_{a^*}\} \lt \mathbb{E} \{ X_j \} \doteq \max_i \mathbb{E} \{ X_i \}$. These two possibilities are mutually exclusive, so the combined expression can be written as: $$\begin{flalign} \mathbb{E} \{ \mu_{a^*}^B \} &= P(a^* \in \mathcal{M}) \mathbb{E} \{ \mu_{a^*}^B \vert a^* \in \mathcal{M} \} + P(a^* \notin \mathcal{M}) \mathbb{E} \{ \mu_{a^*}^B \vert a^* \notin \mathcal{M} \} \\ &= P(a^* \in \mathcal{M}) \max_i \mathbb{E} \{X_i \} + P(a^* \notin \mathcal{M}) \mathbb{E} \{ \mu_{a^*}^B \vert a^* \notin \mathcal{M} \} \\ &\leq P(a^* \in \mathcal{M}) \max_i \mathbb{E} \{X_i \} + P(a^* \notin \mathcal{M}) \max_i \mathbb{E} \{ X_i \} \\ &=\max_i \mathbb{E} \{ X_i \} \end{flalign}$$ The inequality is strict only if $P(a^* \notin \mathcal{M}) \gt 0$ where $\mathcal{M}$ is the true set of maximizing variables. This happens when variables have different expected values, but their distributions overlap. In contrast with the simple estimator, the double estimator is unbiased when the variables are iid, since then all expected values are equal and $P(a^* \in \mathcal{M}) = 1$. """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$4556cf44-4a1c-4ca4-bfb8-4841301a2ce6„§cell_idÙ$4556cf44-4a1c-4ca4-bfb8-4841301a2ce6¤codeÚVfunction display_rook_policy(v::Vector{T}; scale = 1.0) where T<:AbstractFloat @htl("""

""") end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$bb085f2e-83cb-45b2-adf6-c07da892d6e1„§cell_idÙ$bb085f2e-83cb-45b2-adf6-c07da892d6e1¤codeÚbegin car_results = begin_value_iteration_v(jacks_car_mdp, 0.9f0; Î¸ = 0.0001f0) Ï€_car, v_car = makepolicyvalueplots(jacks_car_mdp, car_results[1][end], car_results[2], length(car_results[1])) md""" ### Value Iteration Results for Jack's Car Rental $([Ï€_car v_car]) """ end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$e9359ca3-4d11-4365-bc6e-7babc6fcc7de„§cell_idÙ$e9359ca3-4d11-4365-bc6e-7babc6fcc7de¤codeÙJbegin struct Stay <: GridworldAction end move(::Stay, x, y) = (x, y) end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$639840dc-976a-4e5c-987f-a92afb2d99d8„§cell_idÙ$639840dc-976a-4e5c-987f-a92afb2d99d8¤codeÙ²begin using StatsBase, Statistics, PlutoUI, HypertextLiteral, LaTeXStrings, PlutoPlotly, Base.Threads, LinearAlgebra, Serialization, Latexify, Transducers TableOfContents() end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$dd167494-99d6-45c6-99e4-c36fde5e2d3f„§cell_idÙ$dd167494-99d6-45c6-99e4-c36fde5e2d3f¤codeÙ#md""" ## Jack's Car Rental Code """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$ab331778-f892-4690-8bb3-26464e3fc05f„§cell_idÙ$ab331778-f892-4690-8bb3-26464e3fc05f¤codeÙ.const windy_gridworld = make_windy_gridworld()¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$0e59e813-3d48-4a24-b5b3-9a9de7c500c2„§cell_idÙ$0e59e813-3d48-4a24-b5b3-9a9de7c500c2¤codeÚ }md""" > ### *Exercise 6.7* > Design an off-policy version of the TD(0) update that can be used with arbitrary target policy $\pi$ and convering behavior policy $b$, using each step $t$ the importance sampling ratio $\rho_{t:t}$ (5.3). Recall that equation 5.3 defines: $\rho_{t:T-1} = \prod_{k=t}^{T-1}\frac{\pi(A_k|S_k)}{b(A_k|S_k)}$ with the property that: $\mathbb{E}[\rho_{t:T-1}G_t \mid S_t = s] = v_\pi(s)$ when $G_t$ is generated by the behavior policy. The TD(0) update rule is given by: $V(S_t) \leftarrow V(S_t) + \alpha [R_{t+1} + \gamma V(S_{t+1}) - V(S_t)]$ based on the following form of the Bellman equation: $v_\pi (s)=\text{E}_\pi[R_{t+1} + \gamma v_\pi(S_{t+1}) \mid S_t = s]$ In the off-policy case, the reward $R_{t+1}$ and the subsequent state $S_{t+1}$ would be generated from the behavior policy, but the subsequent value would still be based on the target policy value function. Consider instead the quantity: $q_\pi(s, a) = \mathbb{E} [R_{t+1} + \gamma v_{\pi}(S_{t+1}) \mid S_t = s, A_t = a]$ where we have removed the policy from the expectation since nothing in the bracket depends on sampling from the policy. Even if we chose actions a based on a behavior policy that differs from the target policy, these estimates will be correct because we are directly calculating the value for choosing that action, regardless of what the probability is. Consier we are following some behavior policy $b$ and recall that: $\begin{flalign} v_\pi(s) &= \sum_a \pi(a \vert s) q_\pi (s, a) \\ &= \sum_a \pi(a \vert s) \mathbb{E} [R_{t+1} + \gamma v_{\pi}(S_{t+1}) \mid S_t = s, A_t = a]\\ &= \mathbb{E}_\pi [R_{t+1} + \gamma v_{\pi}(S_{t+1}) \mid S_t = s]\\ v_b(s) &= \sum_a b(a \vert s) q_\pi (s, a) \\ &= \sum_a b(a \vert s) \mathbb{E} [R_{t+1} + \gamma v_{\pi}(S_{t+1}) \mid S_t = s, A_t = a] \\ &= \mathbb{E}_b [R_{t+1} + \gamma v_{\pi}(S_{t+1}) \mid S_t = s]\\ \end{flalign}$ In the TD(0) update we do not calculate this expected value directly but instead average samples together that are drawn from the target policy. This sampling will produce samples weighted by the target policy probabilities thus mimicking the expected value sum. If instead, our samples are drawn from the behavior policy, then the samples will mimic the behavior policy probability weights instead of the target policy. So in order to correctly calculate the expected value we must multiply each behavior policy sample by $\frac{\pi(a \vert s)}{b(a \vert s)} = \frac{\pi(A_t \vert S_t)}{b(A_t \vert S_t)} = \rho_{t:t}$ resulting in the following update rule: $V(S_t) \leftarrow V(S_t) + \alpha [\rho_{t:t} \left ( R_{t+1} + \gamma V(S_{t+1}) \right ) - V(S_t)]$ """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$e4c6456c-867d-4ade-a3c8-310c1e065f14„§cell_idÙ$e4c6456c-867d-4ade-a3c8-310c1e065f14¤code¿render_walk("eg1", l = nstates)¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$3e767962-7339-4f35-a039-b5521a098ed5„§cell_idÙ$3e767962-7339-4f35-a039-b5521a098ed5¤codeÚ¸struct MDP_TD{S, A, F<:Function, G<:Function, H<:Function} states::Vector{S} statelookup::Dict{S, Int64} actions::Vector{A} actionlookup::Dict{A, Int64} state_init::G #function that produces an initial state for an episode step::F #function that produces reward and updated state given a state action pair isterm::H #function that returns true if the state input is terminal function MDP_TD(states::Vector{S}, actions::Vector{A}, state_init::G, step::F, isterm::H) where {S, A, F<:Function, G<:Function, H<:Function} statelookup = makelookup(states) actionlookup = makelookup(actions) new{S, A, F, G, H}(states, statelookup, actions, actionlookup, state_init, step, isterm) end end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$834e5810-77ea-4dfd-9f37-9d9dbf6585a4„§cell_idÙ$834e5810-77ea-4dfd-9f37-9d9dbf6585a4¤codeÙ?makelookup(v::Vector) = Dict(x => i for (i, x) in enumerate(v))¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$667666b9-3ab6-4836-953d-9878208103c9„§cell_idÙ$667666b9-3ab6-4836-953d-9878208103c9¤codeÙ8gridworld_Q_vs_sarsa_vs_expected_sarsa_solve(cliffworld)¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$87fadfc0-2cdb-4be2-81ad-e8fdeffb690c„§cell_idÙ$87fadfc0-2cdb-4be2-81ad-e8fdeffb690c¤codeÚÝfunction show_mrp_state(id, states, rewards, index) reward = rewards[min(index, length(states))] state = states[min(index, length(states))] dir = reward== 0 ? "left" : "right" termcolor = if index >= length(states) """ #$id .term.$dir::before { background-color: red; } """ else """""" end activestate = collect('A':'Z')[state] HTML(""" """ ) end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$4019c974-dcaa-46c8-ac90-e6566a376ea1„§cell_idÙ$4019c974-dcaa-46c8-ac90-e6566a376ea1¤codeÚ6function begin_value_iteration_v(mdp::M, Î³::T, V::Vector{T}; Î¸ = eps(zero(T)), nmax=typemax(Int64)) where {T<:Real, M <: CompleteMDP{T}} valuelist = [copy(V)] value_iteration_v!(V, Î¸, mdp, Î³, nmax, valuelist) Ï€ = form_random_policy(mdp) make_greedy_policy!(Ï€, mdp, V, Î³) return (valuelist, Ï€) end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$4d4577b5-3753-450d-a247-ebd8c3e8f799„§cell_idÙ$4d4577b5-3753-450d-a247-ebd8c3e8f799¤codeÚ)function create_Ïµ_greedy_policy(Q::Matrix{T}, Ïµ::T; Ï€ = copy(Q), get_valid_inds = j -> 1:size(Q, 1)) where T<:Real vhold = zeros(T, size(Q, 1)) for j in 1:size(Q, 2) vhold .= Q[:, j] make_Ïµ_greedy_policy!(vhold, Ïµ; valid_inds = get_valid_inds(j)) Ï€[:, j] .= vhold end return Ï€ end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$e19db54c-4b3c-42d1-b016-9620daf89bfb„§cell_idÙ$e19db54c-4b3c-42d1-b016-9620daf89bfb¤codeÚïbegin abstract type GridworldAction end struct Up <: GridworldAction end struct Down <: GridworldAction end struct Left <: GridworldAction end struct Right <: GridworldAction end struct GridworldState x::Int64 y::Int64 end rook_actions = [Up(), Down(), Left(), Right()] move(::Up, x, y) = (x, y+1) move(::Down, x, y) = (x, y-1) move(::Left, x, y) = (x-1, y) move(::Right, x, y) = (x+1, y) apply_wind(w, x, y) = (x, y+w) const wind_vals = [0, 0, 0, 1, 1, 1, 2, 2, 1, 0] end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$ed4e863b-22dd-4d2b-88d0-b3a56d6713b7„§cell_idÙ$ed4e863b-22dd-4d2b-88d0-b3a56d6713b7¤codeÙ example_6_5(;mdp = stochastic_gridworld, num_episodes = 400, action_display = king_action_display, policy_display = display_king_policy, use_stochastic_dp=true)¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$393cd9d2-dd97-496e-b260-ec6e8b1c13b5„§cell_idÙ$393cd9d2-dd97-496e-b260-ec6e8b1c13b5¤codeÚqbegin struct FiniteAfterstateMDP{T<:Real, S1, S2, A} <: CompleteMDP{T} states::Vector{S1} afterstates::Vector{S2} actions::Vector{A} rewards::Vector{T} #probability transition function now has probabilities for each state/reward transition from each afterstate ptf::Array{T, 3} #each column contains the index of the afterstate reached from the state represented by the column index while taking the action represented by the row index afterstate_map::Matrix{Int64} #each column contains the reward value received from the state represented by the column index while taking the action represented by the row index reward_interim_map::Matrix{T} state_index::Dict{S1, Int64} afterstate_index::Dict{S2, Int64} action_index::Dict{A, Int64} function FiniteAfterstateMDP{T, S1, S2, A}(states::Vector{S1}, afterstates::Vector{S2}, actions::Vector{A}, rewards::Vector{T}, ptf::Array{T, 3}, afterstate_map::Matrix{Int64}, reward_interim_map::Matrix{T}) where {T <: Real, S1, S2, A} new(states, afterstates, actions, rewards, ptf, afterstate_map, reward_interim_map, makelookup(states), makelookup(afterstates), makelookup(actions)) end end FiniteAfterstateMDP(states::Vector{S1}, afterstates::Vector{S2}, actions::Vector{A}, rewards::Vector{T}, ptf::Array{T, 3}, afterstate_map::Matrix{Int64}, reward_interim_map::Matrix{T}) where {T <: Real, S1, S2, A} = FiniteAfterstateMDP{T, S1, S2, A}(states, afterstates, actions, rewards, ptf, afterstate_map, reward_interim_map) #if a reward map is not provided, assume that there are no intermediate rewards FiniteAfterstateMDP(states::Vector{S1}, afterstates::Vector{S2}, actions::Vector{A}, rewards::Vector{T}, ptf::Array{T, 3}, afterstate_map::Matrix{Int64}) where {T <: Real, S1, S2, A} = FiniteAfterstateMDP{T, S1, S2, A}(states, afterstates, actions, rewards, ptf, afterstate_map, zeros(T, length(actions), length(states))) end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$401831c3-3925-465c-a093-28686f0dad2e„§cell_idÙ$401831c3-3925-465c-a093-28686f0dad2e¤codeÙsinitialize_state_value(mdp::MDP_TD; vinit::T = 0.0f0) where T<:AbstractFloat = ones(T, length(mdp.states)) .* vinit¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$2d881aa9-1da3-4d1e-8d05-245956dbaf33„§cell_idÙ$2d881aa9-1da3-4d1e-8d05-245956dbaf33¤codeÚHTML(""" """)¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$047a8881-c2ec-4dd1-8778-e3acf9beba2e„§cell_idÙ$047a8881-c2ec-4dd1-8778-e3acf9beba2e¤codeÙYmd""" #### Sarsa vs Q-learning vs Expected Sarsa Performance on Cliff Walking Example """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$29b0a2d5-9629-46cd-b57c-6f3ef797de66„§cell_idÙ$29b0a2d5-9629-46cd-b57c-6f3ef797de66¤codeÚÆmd""" ## 6.7 Maximization Bias and Double Learning All the control algorithms that we have discussed so far involve maximization in the construction of the target policies. For example, in Q-learning the target policy is the greedy policy given the current action values, which is defined with a max, and in Sarsa the policy is often $\epsilon$-greedy, which also involves a maximization operation. In these algorithms, a maximum over estimated values is used implicitely as an estimate of the maximum value, which can lead to significant positive bias. To see why, consider a isngle state $s$ where there are many actions $a$ whose true values $q(s, a)$, are all zero, but whose estimated values, $Q(s, a)$, are uncertain and thus distributed above and some below zero. The maximum of the true values is zero, but the maximum of the estimates is positive, a positive bias. We call this *maximization bias*. To elaborate on the bias, consider just two random variables $X \sim \mathcal{N}(\theta_1, 1)$ and $Y \sim \mathcal{N}(\theta_2, 1)$. We would like to estimate $\text{max} \left ( \mathbb{E}[X], \mathbb{E}[Y] \right ) = \text{max}(\theta_1, \theta_2)$ and using the approach analogous to our learning algorithms we would calculate $\max(\overline{X}, \overline{Y}) = \text{max} \left ( \sum_{i=1}^N \frac{x_i}{N}, \sum_{i=1}^M \frac{y_i}{M} \right )$. The problem with this approach is that for small numbers of samples, the variance each estimator is high and we are using this estimator both to select which random variable has the higher expected value and what that value is. Empirically, this results in a positive bias which gets worse the more variables we are considering as illustrated in the plot below. """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$c1d6532c-38a4-488f-9789-07d63fe6f125„§cell_idÙ$c1d6532c-38a4-488f-9789-07d63fe6f125¤codeÙTmd""" Load Existing File if Present: $(@bind load_file CheckBox(default = true)) """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$e6672866-c0a0-46f2-bb52-25fcc3352645„§cell_idÙ$e6672866-c0a0-46f2-bb52-25fcc3352645¤codeÚ)md""" > ### *Exercise 6.5* > In the right graph of the random walk example, the RMS error of the TD method seems to go down and then up again, particularly at high $\alpha$â€™s. What could have caused this? Do you think this always occurs, or might it be a function of how the approximate value function was initialized? Since the value function was initialized at the correct value for the center state, all of the values to the right must be increased and the values to the left must be decreased to reach the true values. Episodes that terminate to the right will receive a reward of 1 and push up the rightmost estimate while episodes that terminate to the left will receive a reward of 0 and decrease the leftmost estimate. The correct value for each of these estimates is $\frac{1}{6}$ and $\frac{5}{6}$ respectively. Since there is an equal probability of exiting the walk on the right or the left, both ends of the value estimates will be updated at roughly the same rate. That means that both ends of the chain will move towards the correct value at about the same time and if those updates stay someone synchronized, all of the states will move through correct values at a similar time. At the time when the values are roughly accurate, what happens if $\alpha=0.15$? In this case, consider an update for state E assuming the estimate is already the correct value. $V(E) \leftarrow \frac{5}{6} + 0.15[1 - \frac{5}{6}] \approx 0.858 \gt \frac{5}{6}$. A similar effect happens with state A pushing it below the correct value. The larger $\alpha$ is, the more over-correction we have on future transitions and the feedback from the other neighboring states won't be enough to bring it back to the correct value. Since we pass through or very close to the correct value on the way, we pass through a minimum error value before over or undershooting the value estimate. If we had instead initialized the state values at 0, then the estimate at A would already be too low and would not get corrected until information from the right side propagated through. State E, however, will receive large updates for each episode that exits to the right, but the values for the states to its left will be too low. Since the state value estimates are not moving symmetrically, we won't have the same synchronized pass through the minimum error, since at the time the E estimate is correct, A will still be high error. In this case, we are more likely to see error continue to fall as more updates occur. Below is a visualization of the state estimates at different stages in the training with the original initialization and a 0 initialization. In the 0 case, you can see the left-size estimates take a long time to reach the correct value, but in the original initialization, all the estimate approach the correct values roughly together. """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$223055df-7d5c-4d99-bc8d-fbc9702f906f„§cell_idÙ$223055df-7d5c-4d99-bc8d-fbc9702f906f¤codeÚmd""" ### Example 6.7: Maximization Bias Example Consider an MDP with two non-terminal states A and B. Episodes always start in state A and there are two actions, left and right. Choosing right will always result in a reward of 0 and the episode terminating. Choosing left will transition into state B from which there are many actions, all of which result in a terminal transition with random rewards. The distribution of rewards for each of these actions is $\mathcal{N}(-0.1, 1)$. The estimated value of (A, right) will always be 0 since that is the only possible sample to be collected. The estimated value of (A, left) however will have higher variance but an expected value of -0.1. The problem with Q-learning is that, due to the maximization bias, (A, left) will have a higher value estimate when few samples have been collected since it is very likely that one of the state-action pairs from B will produce a reward greater than 0. The more of these actions exist, the worse the bias and the more samples needed to be collected to remove it. If we employ Double Q-learning instead, however, we can eliminate the bias completely. """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$35dc0d94-145a-4292-b0df-9e84a286c036„§cell_idÙ$35dc0d94-145a-4292-b0df-9e84a286c036¤codeÚJmd""" ## 6.8 Games, Afterstates, and Other Special Cases In the tic-tac-toe example we considered learning a value function for a state after the player's move but before the opponent's response. This type of state is called an *afterstate*, and it is useful in situations when we know a portion of the dynamics in an environment, but then a portion of it is stochastic or unknown. For example, we typically know the immediate effect of our moves, but not necessarily what happens after that. It can be more efficient to learn based on afterstates because there are fewer values to represent than if we need to learn the full action value function. Any state-action pair that maps to the same afterstate would be represented by a single value. These afterstate value functions can also be learned with generalized policy iteration. """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$4d7619ee-933f-452a-9202-e95a8f3da20f„§cell_idÙ$4d7619ee-933f-452a-9202-e95a8f3da20f¤codeÚj@htl(""" Sarsa backup diagram. Black circles represent actions and white circles represent states.

""")¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$00d67a93-437c-4cda-899a-9daa1102e1f2„§cell_idÙ$00d67a93-437c-4cda-899a-9daa1102e1f2¤codeÙ[example_6_7_mdp(;num_episodes = 300, nruns = 10_000, num_actions = 10, load_file=load_file)¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$500d8dd4-fc53-4021-b797-114224ca4deb„§cell_idÙ$500d8dd4-fc53-4021-b797-114224ca4deb¤codeÚqconst rook_action_display = @htl("""

Actions

""")¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$ff5d051e-5de1-48a9-9578-5dbafd71afd1„§cell_idÙ$ff5d051e-5de1-48a9-9578-5dbafd71afd1¤codeÙ|max_bias_visualization(;nvars_max = max_visual_params.nvars, nmax = max_visual_params.nmax, nruns = max_visual_params.nruns)¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$e947f86e-8dc3-4ce7-a9d4-0a7b675a9fa9„§cell_idÙ$e947f86e-8dc3-4ce7-a9d4-0a7b675a9fa9¤codeÚx#the value function in this case represents the value of each afterstate. the afterstates are listed in mdp.afterstates while the states are listed in mdp.states begin_value_iteration_v(mdp::FiniteAfterstateMDP{T,S1, S2, A}, Î³::T; Vinit::T = zero(T), kwargs...) where {T<:Real,S1,S2,A} = begin_value_iteration_v(mdp, Î³, Vinit .* ones(T, length(mdp.afterstates)); kwargs...)¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$a925534e-f9b8-471a-9d86-c9212129b630„§cell_idÙ$a925534e-f9b8-471a-9d86-c9212129b630¤codeÚ7md""" The following represents a trajectory taken by a policy in an environment. We week to estimate $q_\pi(s, a)$ for the current behavior policy $\pi$ using the same TD method we introduced above. The update rule now, however, estimates the value of state action pairs rather than the states themselves. """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$7a5ff8f7-70d4-46f1-a4a7-bbfcec4f6e3f„§cell_idÙ$7a5ff8f7-70d4-46f1-a4a7-bbfcec4f6e3f¤codeÙƒfunction sample_action(Ï€::Matrix{T}, i_s::Integer) where T<:AbstractFloat (n, m) = size(Ï€) sample(1:n, weights(Ï€[:, i_s])) end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$b5e06f59-33b5-414e-9a81-43e8abd07aa3„§cell_idÙ$b5e06f59-33b5-414e-9a81-43e8abd07aa3¤codeÚYmd""" Q-learning Solution $(show_gridworld_policy_value(noisy_gridworld, q_learning(noisy_gridworld, Î±_6_8, 1.0f0, num_episodes = 5_000); winds = fill(0, gridsize))) Double Q-learning Solution $(show_gridworld_policy_value(noisy_gridworld, double_q_learning(noisy_gridworld, Î±_6_8, 1.0f0, num_episodes = 1_000); winds = fill(0, gridsize))) """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$a0d2333f-e87b-4981-bb52-d436ec6481c1„§cell_idÙ$a0d2333f-e87b-4981-bb52-d436ec6481c1¤codeÚƒmd""" Because TD(0) bases its update in part on an existing estimate, we say that it is a *bootstrapping* method, like DP. We know from Chapter 3 that $\begin{flalign} v_\pi & \doteq \mathbb{E}_\pi[G_t \mid S_t = s] \tag{6.3}\\ &= \mathbb{E}[R_{t+1} + \gamma G_{t+1} \mid S_t = s] \tag{from (3.9)}\\ &=\mathbb{E}[R_{t+1} + \gamma v_\pi (S_{t+1}) \mid S_t = s] \tag{6.4} \end{flalign}$ Roughly speaking, Monte Carlo methods use an estimate of (6.3) as a target whereas DP methods use an estiamte of (6.4) as a target. The Monte Carlo target is an estimate because the exepcted value in (6.3) is not known; a sample return is used in place of the real expected return. The DP target is an estimate not because of the expected values, which are assumed to be completely provided by a model of the environment, but because $v_\pi(S_{t+1})$ is not known and the current estimate, $V(S_{t+1})$, is used isntead. The TD target is an estimate for both reasons; it samples the expected values in (6.4) *and* it uses the current estimate $V$ instead of the true $v_\pi$. Thus, TD methods combine the sampling of Monte Carlo with the bootstrapping of DP. TD and Monte Carlo updates are both refered to as *sample updates* because they involve looking ahead to a sample successsor state (or state-action pair). *Expected updates* used in DP methods use the complete distribution of all possible successor states rather than a single sample. Note that the quantity in the brakets in (6.2) is a sort of error, measuring the difference between the estimated value of $S_t$ and the better estimate $R_{t+1} + \gamma V(S_{t+1})$. This quantity is called the *TD error*: $\delta_t \doteq R_{t+1} + \gamma V(S_{t+1}) - V(S_t) \tag{6.5}$ The TD error depends on the subsequent state so it is not available until one step later. That is to say $\delta_t$ is not known until time $t+1$. Also note that if we do not update $V$ during an episode (as we do not in Monte Carlo methods), then the Monte Carlo error can be written as the sum of TD errors: $\begin{flalign} G_t - V(S_t) &= R_{t+1} + \gamma G_{t+1} - V(S_t) + \gamma V(S_{t+1}) - \gamma V(S_{t+1}) \tag{from (3.9)} \\ &=\delta_t + \gamma(G_{t+1} - V(S_{t+1})) \tag{a}\\ &=\delta_t + \gamma \left ( \delta_{t+1} + \gamma(G_{t+2} - V(S_{t+2})) \right ) \tag{using (a)}\\ &=\delta_t + \gamma \delta_{t+1} + \gamma^2 \left ( G_{t+2} - V(S_{t+2}) \right ) \\ &=\delta_t + \gamma \delta_{t+1} + \gamma^2 \delta_{t+2} + \cdots + \gamma^{T-t-1}\delta_{T-1} + \gamma^{T-t}(G_T - V(S_T)) \tag{applying (a) until terination}\\ &=\delta_t + \gamma \delta_{t+1} + \gamma^2 \delta_{t+2} + \cdots + \gamma^{T-t-1}\delta_{T-1} + \gamma^{T-t}(0-0) \tag{definition of terminal state}\\ &=\sum_{k=t}^{T-1} \gamma^{k-t} \delta_k \tag{6.6} \end{flalign}$ This identity is not exact if $V$ is updated during the episode (as it is in TD(0)), but if the step size is small then it may still hold approximately. """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$f841c4d8-5176-4007-b472-9e01a799d85c„§cell_idÙ$f841c4d8-5176-4007-b472-9e01a799d85c¤codeÙ4function addelements(e1, e2) """ $e1 $e2 """ end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$685a7ba3-0f94-4663-a68a-73fa03bd9445„§cell_idÙ$685a7ba3-0f94-4663-a68a-73fa03bd9445¤codeÚfunction make_greedy_policy!(Ï€::Matrix{T}, mdp::FiniteAfterstateMDP{T, S1, S2, A}, V::Vector{T}, Î³::T) where {T<:Real,S1,S2,A} for i_s in eachindex(mdp.states) Ï€[:, i_s] .= mdp.reward_interim_map[:, i_s] .+ V[mdp.afterstate_map[:, i_s]] maxv = -T(Inf) @inbounds @fastmath @simd for i_a in eachindex(mdp.actions) maxv = max(maxv, Ï€[i_a, i_s]) end Ï€[:, i_s] .= (Ï€[:, i_s] .â‰ˆ maxv) x = zero(T) @fastmath @inbounds @simd for i_a in eachindex(mdp.actions) x += Ï€[i_a, i_s] end Ï€[:, i_s] ./= x end return Ï€ end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$d5abd922-a8c2-4f5c-9a6e-d2490a8ad7dc„§cell_idÙ$d5abd922-a8c2-4f5c-9a6e-d2490a8ad7dc¤codeÚq#take a step in the environment from state s using policy Ï€ function takestep(mdp::MDP_TD{S, A, F, G, H}, Ï€::Matrix{T}, s::S) where {S, A, F<:Function, G<:Function, H<:Function, T<:Real} i_s = mdp.statelookup[s] i_a = sample_action(Ï€, i_s) a = mdp.actions[i_a] (r, sâ€²) = mdp.step(s, a) i_sâ€² = mdp.statelookup[sâ€²] return (i_s, i_sâ€², r, sâ€², a, i_a) end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$bce6e4ab-58ec-4e00-be34-bc4caf51f57d„§cell_idÙ$bce6e4ab-58ec-4e00-be34-bc4caf51f57d¤codeÙ¥function cum_mean(v::AbstractVector{T}) where T<:Real out = zeros(length(v)) s = zero(T) for (i, x) in enumerate(v) s += x out[i] = s / i end return out end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$4ddcd409-c31c-444c-8fcf-7cc45b68d93b„§cell_idÙ$4ddcd409-c31c-444c-8fcf-7cc45b68d93b¤codeÙ÷function make_mrp(;l = (5)) function step(s, a) x = s + rand(mrp_moves) r = Float32(floor(x / (l+1))) (r, mod(x, l+1)) #if a transition is terminal will return 0 end MDP_TD(collect(0:l), [1], () -> ceil(Int64, l/2), step, s -> s == 0) end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$c5d32889-634b-4b00-8ba7-0d1ecaf94f05„§cell_idÙ$c5d32889-634b-4b00-8ba7-0d1ecaf94f05¤codeÙinitialize_state_action_value(mdp::MDP_TD; qinit::T = 0.0f0) where T<:AbstractFloat = ones(T, length(mdp.actions), length(mdp.states)) .* qinit¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$3b16cbb7-f859-4871-9a63-8b40eb4191be„§cell_idÙ$3b16cbb7-f859-4871-9a63-8b40eb4191be¤codeÚžmd""" > ### *Exercise 6.1* > If $V$ changes during the episode, then (6.6) only holds approximately; what would the difference be between the two sides? Let $V_t$ denote the array of state values used at time $t$ in the TD error (6.5) and in the TD update (6.2). Redo the derivation above to determine the additional amount that must be added to the sum of TD errors in order to equal the Monte Carlo error. """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$902738c3-2f7b-49cb-8580-29359c857027„§cell_idÙ$902738c3-2f7b-49cb-8580-29359c857027¤codeÙM@htl(""" """)¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$c93ed1f2-3c38-4f68-8bf8-2cdf4e7bee34„§cell_idÙ$c93ed1f2-3c38-4f68-8bf8-2cdf4e7bee34¤codeÚ{md""" Now we can rewrite the Monte Carlo error using (3.9) again and proceed with the derivation keeping track of the time index of the value estiamtes: $\begin{flalign} G_t - V_t(S_t) &= R_{t+1} + \gamma G_{t+1} - V_t(S_t) + \gamma V_{t}(S_{t+1}) - \gamma V_{t}(S_{t+1}) \tag{from (3.9)}\\ &= \delta_t + \gamma \left [ G_{t+1} - V_t(S_{t+1}) \right ] \\ &= \delta_t + \gamma \left [ G_{t+1} - V_{t+1}(S_{t+1}) + V_{t+1}(S_{t+1}) - V_t(S_{t+1}) \right ] \\ \end{flalign}$ Define the following $\eta_{t} \doteq V_{t+1}(S_{t+1}) - V_t(S_{t+1})$ which let's us re-write the equation $G_t - V_t(S_t) = \delta_t + \gamma \eta_{t} + \gamma \left [ G_{t+1} - V_{t+1}(S_{t+1})\right ]$ Notice that the term in the brakets is equivalent to the left hand side but shifted forward one time step. That implies the equation can be expanded recursively as we did with the original derivation. """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$f36822d7-9ea8-4f5c-9925-dc2a466a68ba„§cell_idÙ$f36822d7-9ea8-4f5c-9925-dc2a466a68ba¤codeÙ%md""" # Dependencies and Settings """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$3e367811-247b-4bd6-b8fe-63f8996fb9e8„§cell_idÙ$3e367811-247b-4bd6-b8fe-63f8996fb9e8¤codeÙ#md""" ### Formal Proof for Bias """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$7de9b6a4-49ce-4dc3-9d5b-cecfcb98bba1„§cell_idÙ$7de9b6a4-49ce-4dc3-9d5b-cecfcb98bba1¤codeÙCconst jacks_car_afterstate_mdp = create_car_rental_afterstate_mdp()¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$c4719c42-87aa-482a-95aa-a1492d42835d„§cell_idÙ$c4719c42-87aa-482a-95aa-a1492d42835d¤codeÙ#md""" #### Stochastic Gridworld """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$495f5606-0567-47ad-a266-d21320eecfc6„§cell_idÙ$495f5606-0567-47ad-a266-d21320eecfc6¤codeÚ¿md""" Monte Carlo nonstationary update rule for value function $V(S_t) \leftarrow V(S_t) + \alpha [G_t - V(S_t)] \tag{6.1}$ where $G_t$ is the actual return following time $t$, and $\alpha$ is a constant step-size parameter. Call this method *constant-Î± MC*. The use of a constant step size Î± instead of the usual sample average is what makes this estiamtion method suitable for non-stationary problems. Because the value $G_t$ is required, this method requires waiting for the final results from the end of an episode. In contrast, TD methods need only wait for results from the following timestep to perform an update. The following is the simplest TD method update rule: $V(S_t) \leftarrow V(S_t) + \alpha[R_{t+1} + \gamma V(S_{t+1}) - V(S_t)] \tag{6.2}$ where the update can be made immediately on transition to $S_{t+1}$ after receiving $R_{t+1}$. This TD method is called $TD(0)$, or *one-step TD*. See below for code implementing this. """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$0a4ed8c7-27ca-45cb-af15-70ddd86240fb„§cell_idÙ$0a4ed8c7-27ca-45cb-af15-70ddd86240fb¤codeÙ5md""" #### Batch Method Estimation Implementation """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$cdedd35e-52b8-40a5-938d-2d36f6f93217„§cell_idÙ$cdedd35e-52b8-40a5-938d-2d36f6f93217¤codeÚÙconst king_action_display = @htl("""

Actions

""")¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$3756a3f8-18e8-4d62-afa1-cfeb4183820c„§cell_idÙ$3756a3f8-18e8-4d62-afa1-cfeb4183820c¤codeÚ function double_expected_sarsa(mdp::MDP_TD{S, A, F, G, H}, Î±::T, Î³::T; num_episodes = 1000, qinit = zero(T), Ïµinit = one(T)/10, Qinit::Matrix{T} = initialize_state_action_value(mdp; qinit=qinit), decay_Ïµ = false, target_policy_function! = (v, Ïµ, s) -> make_Ïµ_greedy_policy!(v, Ïµ), behavior_policy_function! = (v, Ïµ, s) -> make_Ïµ_greedy_policy!(v, Ïµ), Ï€init_target::Matrix{T} = create_Ïµ_greedy_policy(Qinit, Ïµinit), Ï€init_behavior::Matrix{T} = create_Ïµ_greedy_policy(Qinit, Ïµinit), save_state::S = first(mdp.states), save_history = false) where {S, A, F, G, H, T<:AbstractFloat} terminds = findall(mdp.isterm(s) for s in mdp.states) Q1 = copy(Qinit) Q2 = copy(Qinit) Q1[:, terminds] .= zero(T) Q2[:, terminds] .= zero(T) Ï€_target1 = copy(Ï€init_target) Ï€_target2 = copy(Ï€init_target) Ï€_behavior = copy(Ï€init_behavior) vhold1 = zeros(T, length(mdp.actions)) vhold2 = zeros(T, length(mdp.actions)) vhold3 = zeros(T, length(mdp.actions)) #keep track of rewards and steps per episode as a proxy for training speed rewards = zeros(T, num_episodes) steps = zeros(Int64, num_episodes) if save_history action_history = Vector{A}(undef, num_episodes) end for ep in 1:num_episodes Ïµ = decay_Ïµ ? Ïµinit/ep : Ïµinit s = mdp.state_init() rtot = zero(T) l = 0 while !mdp.isterm(s) (i_s, i_sâ€², r, sâ€², a, i_a) = takestep(mdp, Ï€_behavior, s) if save_history && (s == save_state) action_history[ep] = a end # q_expected = sum(Ï€_target[i, i_sâ€²]*(Q1[i, i_sâ€²]*toggle + Q2[i, i_sâ€²]*(1-toggle)) for i in eachindex(mdp.actions)) toggle = rand() < 0.5 q_expected = if toggle sum(Ï€_target2[i, i_sâ€²]*Q1[i, i_sâ€²] for i in eachindex(mdp.actions)) else sum(Ï€_target1[i, i_sâ€²]*Q2[i, i_sâ€²] for i in eachindex(mdp.actions)) end if toggle Q2[i_a, i_s] += Î±*(r + Î³*q_expected - Q2[i_a, i_s]) else Q1[i_a, i_s] += Î±*(r + Î³*q_expected - Q1[i_a, i_s]) end #update terms for next step if toggle vhold2 .= Q2[:, i_s] target_policy_function!(vhold2, Ïµ, s) Ï€_target2[:, i_s] .= vhold2 else vhold1 .= Q1[:, i_s] target_policy_function!(vhold1, Ïµ, s) Ï€_target1[:, i_s] .= vhold1 end vhold3 .= vhold1 .+ vhold2 behavior_policy_function!(vhold3, Ïµ, s) Ï€_behavior[:, i_s] .= vhold3 s = sâ€² l+=1 rtot += r end steps[ep] = l rewards[ep] = rtot end Q1 .+= Q2 Q1 ./= 2 plain_return = Q1, create_greedy_policy(Q1), steps, rewards save_history && return (plain_return..., action_history) return plain_return end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$04a0be81-ee5f-4eeb-963a-ad930392d50b„§cell_idÙ$04a0be81-ee5f-4eeb-963a-ad930392d50b¤codeexample_6_5()¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$136d1d96-b590-4f03-9e42-2337efc560cc„§cell_idÙ$136d1d96-b590-4f03-9e42-2337efc560cc¤codeÚŸHTML(""" """)¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$6bffb08c-704a-4b7c-bfce-b3d099cf35c0„§cell_idÙ$6bffb08c-704a-4b7c-bfce-b3d099cf35c0¤codeÚƒfunction gridworld_Q_vs_sarsa_solve(mdp; Î±=0.5f0, Ïµ=0.1f0, num_episodes = 500, nruns = 100) function addtuple(t1, t2) Tuple(t1[i] .+ t2[i] for i in eachindex(t1)) end sarsa_results = mapreduce(addtuple, 1:nruns) do _ sarsa(mdp, Î±, 1.0f0; num_episodes = num_episodes, Ïµinit = Ïµ) end qlearning_results = mapreduce(addtuple, 1:nruns) do _ q_learning(mdp, Î±, 1.0f0; num_episodes = num_episodes, Ïµinit = Ïµ) end # qlearning_results = [q_learning(mdp, Î±, 1.0f0; num_episodes = num_episodes, Ïµinit = Ïµ) for _ in 1:nruns] p1 = plot_path(mdp, create_greedy_policy(sarsa_results[1] ./ nruns); windtext = fill("", 12), xtitle = "", title = "Cliff Walking Sarsa Path") p2 = plot_path(mdp, qlearning_results[2] ./ nruns; windtext = fill("", 12), xtitle = "", title = "Cliff Walking Q Learning Path") traces = [scatter(x = 1:num_episodes, y = results[4] ./ nruns, name = name) for (results, name) in zip([sarsa_results, qlearning_results], ["Sarsa", "Q-learning"])] p3 = plot(traces, Layout(xaxis_title = "Episodes", yaxis = attr(title = "Sum of rewards during episode", range = [-100, -15]))) p3 = plot(traces, Layout(xaxis_title = "Episodes", yaxis = attr(title = "Sum of rewards during episode", range = [-100, -15]))) steptraces = [scatter(x = 1:num_episodes, y = results[3] ./ nruns, name = name) for (results, name) in zip([sarsa_results, qlearning_results], ["Sarsa", "Q-learning"])] p4 = plot(steptraces, Layout(xaxis_title = "Episodes", yaxis = attr(title = "Average steps per episode
during training", range = [0, 100]))) @htl("""

$p1 $p2

$p3 $p4 """) end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$f95ceb98-f12e-4650-9ad3-0609b7ecd0f3„§cell_idÙ$f95ceb98-f12e-4650-9ad3-0609b7ecd0f3¤codeÚÀmd""" > ### *Exercise 6.14* > Describe how the task of Jack's Car Rental (Example 4.2) could be reformulated in terms of afterstates. Why, in terms of this specific task, would such a reformulation be likely to speed convergence? In the original problem the state is the number of cars at each location at the end of the day. The actions are the net numbers of cars moved between the two locations overnight. With an afterstate approach, the value function would only consider the number of cars after the movement is performed. This would be equivalent to valuing the state the following morning when customers begin to return and rent new cars. The random processes that occur the following day will have a good/bad outcome based on the cars available at each location at the start of the day. This approach would likely converge faster because we are only modeling the value of the state that is directly related to whether or not cars will be available. Similar to the tic-tac-toe example, many actions will result in the same afterstate, but equivalent afterstates should have the same value. See below for code that creates the car rental MDP and solves it using value iteration with afterstates. """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$8787a5fd-d0ab-46b5-a7df-e7bc103a7378„§cell_idÙ$8787a5fd-d0ab-46b5-a7df-e7bc103a7378¤codeÚ|function value_iteration_v!(V, Î¸, mdp, Î³, nmax, valuelist) nmax <= 0 && return valuelist #update value function delt = bellman_optimal_value!(V, mdp, Î³) #add copy of value function to results list push!(valuelist, copy(V)) #halt when value function is no longer changing delt <= Î¸ && return valuelist value_iteration_v!(V, Î¸, mdp, Î³, nmax - 1, valuelist) end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$03a06e10-f68a-403c-97bf-7a7627f2c5d6„§cell_idÙ$03a06e10-f68a-403c-97bf-7a7627f2c5d6¤codeÚìmd""" Hasselt, in his paper proposes an alternative **Double Estimator** to correct this bias in approximating $\max_i \mathbb{E} \{ X_i \}$ which uses two sets of estimators: $\mu^A = \{ \mu_1^A, \dots, \mu_M^A \}$ and $\mu^B = \{ \mu_1^B, \dots, \mu_M^B \}$. Both sets of estimators are updated with a subset of samples we draw, such that $S = S^A \cup S^B$ and $S^A \cap S^B = \emptyset$ and $\mu_i^A(S) = \frac{1}{\vert S_i^A \vert } \sum_{s \in S_i^A} s$ and $\mu_i^B(S) = \frac{1}{\vert S_i^B \vert } \sum_{s \in S_i^B} s$. Like the single estimator $\mu_i$, both $\mu_i^A$ and $\mu_i^B$ are unbiased if we assume that samples are split in a proper manner, for instance randomly over the two sets of estimators. Let $Max^A (S) \doteq \{ j \mid \mu_j^A (S) = \max_i \mu_i^A (S) \}$ be the set of maximal estimates in $\mu^A(S)$. Since $\mu^B$ is an independent, unbiased set of estimators, we have $\mathbb{E} \{ \mu_j^B \} = \mathbb{E} \{ X_j \}$ for all $j$, including all $j \in Max^A$. Let $a^*$ be an estimator that maximizes $\mu^A:\mu_{a^*}^A(S) \doteq \max_i \mu_i ^A (S)$. If there are multiple estimators that maximize $\mu^A$, we can for instance pick one at random. Then we can use $\mu_{a^*}^B$ as an estimate for $\max_i \mathbb{E} \{ \mu_i^B \}$ and therefore also for $\max_i \mathbb{E} \{ X_i \}$ and we obtain the approximation $$\max_i \mathbb{E} \{ X_i \} = \max_i \mathbb{E} \{ \mu_i^B \} \approx \mu_{a^*}^B \tag{e}$$ As we gain more samples the variance of the estimators decreases. In the limit, $\mu_i^A(S) = \mu_i^B(S) = \mathbb{E} \{ X_i \}$ for all $i$ and the approximation in $(e)$ converges to the correct result. Assume that hte underlying PDFs are continuous. The probability $P(j = a^*)$ for any $j$ is then equal to the probability that all $i \neq j$ give lower estimates. Thus $\mu_j^A(S) = x$ is maximal for some value $x$ with probability $\prod_{i \neq j}^M P(\mu_i ^A \lt x)$. Integrating out $x$ gives $P(j = a^*) = \int_{-\infty}^\infty P(\mu_j^A = x) \prod_{i \neq j}^M P(\mu_i^A < x)dx \doteq \int_{-\infty}^\infty f_j^A(x) \prod_{i \neq j}^M F_i^A(x) dx$, where $f_i^A$ and $F_i^A$ are the PDF and CDF of $\mu_i^A$. The expected value of the approximation by the double estimator can thus be givne by $$\sum_j^M P(j = a^*) \mathbb{E} \{ \mu_j^B \} = \sum_j^M \mathbb{E} \{ \mu_j ^B \} \int_{-\infty}^\infty f_j^A(x) \prod_{i \neq j} F_i^A(x)dx \tag{f}$$ For discrete PDFs the probability that two or more estimators are equal should be taken into account and the integrals should be replaced with sums. Comparing (f) to (c), we see the difference is that the double estimator uses $\mathbb{E} \{ \mu_j^B \}$ in place of $x$. The single estimator overestimates, because $x$ is within the integral and therefore correlates with the monotonically increasing product $\prod_{i \neq j} F_i^\mu(x)$. The double estimator underestimates because the probabilities $P(j = a^*)$ sum to one and therefore the approximation is a weighted estimate of unbiased expected values, which must be lower or equal to the maximum expected value. In the following lemma, which holds in both discrete and the continuous case, we prove in general that hte estimate $\mathbb{E} \{ \mu_{a^*}^B \}$ is not an unbiased estimate of $\max_i \mathbb{E} \{ X_i \}$. """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$0d6a11af-b146-4bbc-997e-a11b897269a7„§cell_idÙ$0d6a11af-b146-4bbc-997e-a11b897269a7¤codeÙ,md""" ## 6.4 Sarsa: On-policy TD Control """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$72b4d8d5-464c-4561-8c69-28ef3f59630b„§cell_idÙ$72b4d8d5-464c-4561-8c69-28ef3f59630b¤codeÚ#update the value function with the MC method using a single episode function update_value!(V::Vector{T}, ::MC, Î±::T, Î³::T, mdp::MDP_TD{S, A, F, G, H}, states::Vector{S}, actions::Vector{A}, rewards::Vector{T}) where {T<:AbstractFloat, S, A, F<:Function, G<:Function, H<:Function} l = length(states) g = zero(T) err = zero(T) for i in l:-1:1 g = Î³*g + rewards[i] s = states[i] i_s = mdp.statelookup[s] v_old = V[i_s] v_new = v_old + Î±*(g-v_old) err = max(err, calc_error(v_old, v_new)) V[i_s] = v_new end return err end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$47c2cbdd-f6db-4ce5-bae2-8141f30aacbc„§cell_idÙ$47c2cbdd-f6db-4ce5-bae2-8141f30aacbc¤codeÚÿmd""" ### Example 6.2 Random Walk In this example we empirically compare the prediction abilities of TD(0) and constant-Î± MC when applied to the following Markov reward process: In this MRP the agent's actions are irrelevant as each step the state transition occurs either to the left or the right with equal probability. An episode ends when the transition terminates at the left or right side of the chain. If the agent exits to the right, it receives a reward of 1. Otherwise, all other transitions receive a reward of 0. Below is an animation of the agent randomly moving through an episode. Longer chains will have longer episode times on average growing roughly quadratically with the length of the chain. Underneath the visualizations is the code. """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$8224b808-5778-458b-b683-ea2603c82117„§cell_idÙ$8224b808-5778-458b-b683-ea2603c82117¤codeÙ(md""" ### Example 6.6: Cliff Walking """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$c4919d14-8cba-43e6-9369-efc52bcb9b23„§cell_idÙ$c4919d14-8cba-43e6-9369-efc52bcb9b23¤codeÚ#function make_greedy_policy!(Ï€::Matrix{T}, mdp::FiniteMDP{T, S, A}, V::Vector{T}, Î³::T) where {T<:Real,S,A} for i_s in eachindex(mdp.states) maxv = -Inf for i_a in eachindex(mdp.actions) x = zero(T) for i_r in eachindex(mdp.rewards) for i_sâ€² in eachindex(V) x += mdp.ptf[i_sâ€², i_r, i_a, i_s] * (mdp.rewards[i_r] + Î³ * V[i_sâ€²]) end end maxv = max(maxv, x) Ï€[i_a, i_s] = x end Ï€[:, i_s] .= (Ï€[:, i_s] .â‰ˆ maxv) Ï€[:, i_s] ./= sum(Ï€[i_a, i_s] for i_a in eachindex(mdp.actions)) end return Ï€ end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$05664aaf-575b-4249-974c-d8a2e63f380a„§cell_idÙ$05664aaf-575b-4249-974c-d8a2e63f380a¤codeÚÖmd""" > ### *Exercise 6.11* > Why is Q-learning considered an *off-policy* control method? If we compare to the on-policy update rule, the expected value being calculated at each state action pair should be: $Q_\pi(S_t, A_t) = \text{E}_\pi [R_{t+1} + \gamma Q_\pi(S_{t+1}, A_{t+1})]$ which we estimate with sampling. In Q-learning, the expected value being estimated is instead: $Q_\pi(S_t, A_t) = \text{E}_\pi [R_{t+1} + \gamma \text{max}_a Q_\pi(S_{t+1}, a)]$ Since the behavior policy being used to select the subsequent action taken from state $S_{t+1}$ is $\epsilon$-greedy, there is a probability that the next action will not match the maximizing action. So the Q-Learning update is computing the optimal greedy state-action value function rather than the optimal $\epsilon$-greedy value function of the behavior policy. Sarsa, in contrast follows the same policy and computes the value function which matches this policy, thus making it a true on-policy method. """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$dda222ef-8178-40bb-bf20-d242924c4fab„§cell_idÙ$dda222ef-8178-40bb-bf20-d242924c4fab¤codeÙBconst king_gridworld = make_windy_gridworld(;actions=king_actions)¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$48b557e3-e239-45e9-ab15-105bcca96492„§cell_idÙ$48b557e3-e239-45e9-ab15-105bcca96492¤codeÚ´md""" ## 6.3 Optimality of TD(0) Suppose there is available only a finite amount of experience, say 10 episodes or 100 time steps. In this case, a common approach with incremental learning methods is to present the experience repeatedly until the method converges upon an answer. Given an approximate value function $V$, the increments specified by (6.1) or (6.2) are computed for every time step $t$ at which a nonterminal state is visited, but the value function is changed only once, by the sum of all the increments. Then all the available experience is processed again with the new value function to produce a new overall increment, and so on, until the value function converged. We call this *batch updating* because updates are made only after processing each complete *batch* of training data. Under batch updating, TD(0) converges deterministically to a single answer independent of the step-size parameter, $\alpha$, as long as $\alpha$ is chosen to be sufficiently small. The constant $\alpha$ MC method also converges deterministically under the same conditions, but to a difference answer. Understanding these two answers will help us understand the difference between the two methods. Under normal updating the methods do not move all the way to their respective batch answers, but in some sense they take steps in these directions. Before trying to understand the two answers in general, for all possible tasks, we first look at a few examples. ### Example 6.3: Random walk under batch updating Batch-updating versions of TD(0) and constant-$\alpha$ MC were applied as follows to the random walk prediction example (Example 6.2). After each new episode, all episodes seen so far were treated as a batch. They were repeatedly presented to the algorithm, either TD(0) or constant-$\alpha$ MC, with $\alpha$ sufficiently small that the value function converged. The resulting value function was then compared with $v_\pi$, and the average root mean square error across the five states (and accross 100 independent repetitions of the whole experiment) was plotted to obtain the learning curves shown in Figure 6.2. Note that the batch TD method was consistently better than the batch Monte Caro method. Under batch training, constant-$\alpha$ MC converges to the values, $V(s)$, that are sample averages of the actual returns experienced after visiting each state $s$. These are optimal estimates in the sense that they minimize the mean square error from the actual returns in the training set. In this sense it is surprising that the batch TD method was able to perform better according to the root mean square error measure shown in figure 6.2. How is it that batch TD was able to perform better than this optimal method? The answer is that the Monte Carlo method is optimal only in a limited way, and that TD is optimal in a way that is more relevant to predicting returns. Below is code implementing both batch methods in general for arbitrary MDPs. """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$846720cc-550a-4a3c-a80e-40b99671f4e2„§cell_idÙ$846720cc-550a-4a3c-a80e-40b99671f4e2¤code¹const mrp_moves = [-1, 1]¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$6556dafb-04fa-434c-868a-8d7bb7b5b196„§cell_idÙ$6556dafb-04fa-434c-868a-8d7bb7b5b196¤codeÚ%function make_cliffworld(;actions = rook_actions, xmax = 12, ymax = 4, cliff_penalty::T = -100f0, step_reward::T = -1f0) where T<:AbstractFloat start = GridworldState(1, 1) sinit() = start isterm(s) = s == GridworldState(xmax, 1) states = [GridworldState(x, y) for x in 1:xmax for y in 1:ymax] boundstate(x::Int64, y::Int64) = (clamp(x, 1, xmax), clamp(y, 1, ymax)) function cliffcheck(s) safereturn = (step_reward, s) unsafereturn = (cliff_penalty, start) s.y > 1 && return safereturn (s.x == 1) && return safereturn (s.x == xmax) && return safereturn unsafereturn end function step(s::GridworldState, a::GridworldAction) (x1, y1) = move(a, s.x, s.y) (x2, y2) = boundstate(x1, y1) cliffcheck(GridworldState(x2, y2)) end MDP_TD(states, actions, sinit, step, isterm) end ¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$3f4f078a-9fc4-4b02-b499-a805fd5f1071„§cell_idÙ$3f4f078a-9fc4-4b02-b499-a805fd5f1071¤codeÚ®function max_bias_visualization_comp(;nvars = 2, nmax = 100, nruns = 10_000) nlist = collect(2:2:nmax) vars = [randn(nmax, nruns) for _ in 1:nvars] max_estimate = [begin mapreduce(j -> begin means1 = [mean(view(x, 1:2:n, j)) for x in vars] means2 = [mean(view(x, 2:2:n, j)) for x in vars] max1 = maximum(means1 .+ means2) / 2 max2 = (means2[argmax(means1)] + means1[argmax(means2)]) / 2 return (max1, max2) end, (a, b) -> (a[1]+b[1], a[2]+b[2]), 1:nruns) end for n in nlist] estimate1 = [a[1] for a in max_estimate] ./ (nruns .* nlist) estimate2 = [a[2] for a in max_estimate] ./ (nruns .* nlist) t1 = scatter(x = 2:2:nmax, y = estimate1, name = "Max of Means Estimate") t2 = scatter(x = 2:2:nmax, y = estimate2, name = "Double Max Estimate") plot([t1, t2], Layout(xaxis_title = "Number of Samples Per Variable", yaxis_title = "Estimate of Maximum Mean", title = "Maximization Bias for $nvars Variables with Zero Mean")) end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$75bfe913-8757-4789-b708-7d400c225218„§cell_idÙ$75bfe913-8757-4789-b708-7d400c225218¤codeÙÑ@htl("""

$(plot_path(windy_gridworld))

$rook_action_display

""")¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$fe2ebf39-4ab3-4aa8-abbd-23389eaf400e„§cell_idÙ$fe2ebf39-4ab3-4aa8-abbd-23389eaf400e¤codeÚÍmd""" Sarsa converges with probability 1 to an optimal policy and action-value function, under the usual conditions on step sizes (2.7), as long as all state-action pairs are visited an infinite number of times and the policy converges in the limit to the greedy policy (which can be arranged, for example, with $\epsilon$-greedy policies by setting $\epsilon = 1/t$). Below is code that implements Sarsa using the $\epsilon$-greedy method for exploration. """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$98bec66e-d8f3-4d4d-b4ec-5838489164e5„§cell_idÙ$98bec66e-d8f3-4d4d-b4ec-5838489164e5¤codeÙ:const noisy_gridworld = make_noisy_gridworld(l = gridsize)¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$b59eacf8-7f78-4015-bf2c-66f89bf0e24e„§cell_idÙ$b59eacf8-7f78-4015-bf2c-66f89bf0e24e¤codeÚÌmd""" > ### *Exercise 6.10: Stochastic Wind (programming)* > Re-solve the windy gridworld task with King's moves, assuming the effect of the wind, if there is any, is stochastic, sometimes varying by 1 from the mean values given for each column. That is, a third of the time you move exactly according to these values, as in the previous exercise, but also a third of the time you move one cell above that, and another third of the time you move one cell below that. For example, if you are one cell to the right of the goal and you move left, then one-third of the time you move one cell above the goal, one-third of the time you move two cells above the goal, and one-third of the time you move to the goal. """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$1ae30f5d-b25b-4dcb-800f-45c463641ec5„§cell_idÙ$1ae30f5d-b25b-4dcb-800f-45c463641ec5¤codeÚmd""" > ### *Exercise 6.8* > Show that an action-value version of (6.6) holds for the action-value form of the TD error $\delta_t=R_{t+1}+\gamma Q(S_{t+1}, A_{t+1}) - Q(S_t, A_t)$, again assuming that the values don't change from step to step. The derivation in (6.6) starts with the definition in (3.9): $G_t = R_{t+1} + \gamma G_{t+1}$ and derives the following: $\delta_t \doteq R_{t+1} + \gamma V(S_{t+1}) - V(S_t)$ $G_t - V(S_t) = \sum_{k=t}^{T-1} \gamma^{k-t} \delta_k$ Now we have the action-value form of the TD error: $\delta_t \doteq R_{t+1}+\gamma Q(S_{t+1}, A_{t+1}) - Q(S_t, A_t)$ Let us transform (3.9) in a similar manner to derive the rule: $\begin{flalign} G_t - Q(S_t, A_t) &= R_{t+1} + \gamma G_{t+1} - Q(S_t, A_t) + \gamma Q(S_{t+1}, A_{t+1}) - \gamma Q(S_{t+1}, A_{t+1}) \\ &= \delta_t + \gamma (G_{t+1} - Q(S_{t+1}, A_{t+1})) \\ &= \delta_t + \gamma \delta_{t+1} + \gamma^2 (G_{t+2} - Q(S_{t+2}, A_{t+2})) \tag{using recursion} \\ &= \delta_t + \gamma \delta_{t+1} + \gamma^2 \delta_{t+1} + \cdots + \gamma^{T-t-1} \delta_{T-1} + \gamma^{T-t}(G_T - Q(S_T, A_T)) \\ &= \delta_t + \gamma \delta_{t+1} + \gamma^2 \delta_{t+1} + \cdots + \gamma^{T-t-1} \delta_{T-1} + \gamma^{T-t}(0-0) \tag{terminal value} \\ &= \sum_{k=t}^{T-1}\gamma^{k-t}\delta_k \end{flalign}$ """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$7d3be915-9092-4261-8435-dd546a7db144„§cell_idÙ$7d3be915-9092-4261-8435-dd546a7db144¤codeÙ¢function cum_max(v::AbstractVector{T}) where T<:Real out = similar(v) m = first(v) for (i, x) in enumerate(v) m = max(m, x) out[i] = m end return out end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$71774d5f-7841-403f-bc6b-1a0cbbb72d6d„§cell_idÙ$71774d5f-7841-403f-bc6b-1a0cbbb72d6d¤codeÙ‡const windy_gridworld_mdp_dp = create_gridworld_mdp(10, 7, GridworldState(1, 4), GridworldState(8, 4), wind_vals, rook_actions, -1.0f0)¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$22c2213e-5b9b-410f-a0ef-8f1e3db3c532„§cell_idÙ$22c2213e-5b9b-410f-a0ef-8f1e3db3c532¤codeÙfexample_6_3(;l = params_6_2.l, max_episodes = params_6_2.ep, Î± = Float32(params_6_2.Î±), vinit=0.5f0)¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$39470c74-e554-4f6c-919d-97bec1eec0f3„§cell_idÙ$39470c74-e554-4f6c-919d-97bec1eec0f3¤codeÙ¯md""" Adding king's move actions, the optimal policy can finish in 7 steps vs 15 for the original actions. What happens after adding a 9th action that causes no movement? """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$9da5fd84-800d-4b3e-8627-e90ce8f20297„§cell_idÙ$9da5fd84-800d-4b3e-8627-e90ce8f20297¤codeÚÞfunction show_grid_policy(mdp, Ï€, wind::Vector, display_function, name; action_display = king_action_display, scale = 1.0) width = maximum(s.x for s in mdp.states) height = maximum(s.y for s in mdp.states) start = mdp.state_init() termind = findfirst(mdp.isterm, mdp.states) sterm = mdp.states[termind] ngrid = width*height @htl("""

$(HTML(mapreduce(i -> """

$(display_function(Ï€[:, i], scale =0.8))

""", *, eachindex(mdp.states))))

$(HTML(mapreduce(i -> """

$(wind[i])

""", *, 1:width)))

$(action_display)

Wind Values

""") end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$415ea466-2038-48fe-9d24-39a90182f1eb„§cell_idÙ$415ea466-2038-48fe-9d24-39a90182f1eb¤codeÚâfunction monte_carlo_pred_V(Ï€::Matrix{T}, mdp::MDP_TD{S, A, F, G, H}, Î±::T, Î³::T; num_episodes::Integer = 1000, vinit::T = zero(T), V::Vector{T} = initialize_state_value(mdp; vinit=vinit), save_states = Vector{S}()) where {T <: AbstractFloat, S, A, F, G, H} check_policy(Ï€, mdp) terminds = findall(mdp.isterm(s) for s in mdp.states) V[terminds] .= zero(T) #terminal state must always have 0 value v_saves = zeros(T, length(save_states), num_episodes+1) function updatesaves!(ep) for (i, s) in enumerate(save_states) i_s = mdp.statelookup[s] v_saves[i, ep] = V[i_s] end end updatesaves!(1) #there's no check here so this is equivalent to every-visit estimation function updateV!(states, actions, rewards; t = length(states), g = zero(T)) t = length(states) g = zero(T) for t = length(states):-1:1 #accumulate future discounted returns g = Î³*g + rewards[t] i_s = mdp.statelookup[states[t]] i_a = mdp.actionlookup[actions[t]] V[i_s] += Î±*(g - V[i_s]) #update running average of V end end for j in 1:num_episodes (states, actions, rewards) = runepisode(mdp, Ï€) #update value function for each trajectory updateV!(states, actions, rewards) updatesaves!(j+1) end return V, v_saves end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$0e488135-49e5-4e71-83b1-05d8e61f0510„§cell_idÙ$0e488135-49e5-4e71-83b1-05d8e61f0510¤codeÙ”const kingplus_gridworld_mdp_dp = create_gridworld_mdp(10, 7, GridworldState(1, 4), GridworldState(8, 4), wind_vals, [king_actions; Stay()], -1.0f0)¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$1f28280e-ba3b-4ca5-89e4-6ca4a90f5893„§cell_idÙ$1f28280e-ba3b-4ca5-89e4-6ca4a90f5893¤codeÚ’begin car_afterstate_results = begin_value_iteration_v(jacks_car_afterstate_mdp, 0.9f0, Î¸ = 0.0001f0) Ï€_car_afterstate, v_car_afterstate = makepolicyvalueplots(jacks_car_afterstate_mdp, car_afterstate_results[1][end], car_afterstate_results[2], length(car_afterstate_results[1])) md""" ### Afterstate Value Iteration Results for Jack's Car Rental $([Ï€_car_afterstate v_car_afterstate]) """ end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$6d9ae541-cf8c-4687-9f0a-f008944657e3„§cell_idÙ$6d9ae541-cf8c-4687-9f0a-f008944657e3¤codeÚfunction figure_6_3(mdp; load_file=true) fname = "figure_6_3.bin" load_file && isfile(fname) && return deserialize(fname) Î±list = 0.1f0:0.05f0:1.0f0 function generate_data(estimator, nep, nruns) out = zeros(length(Î±list)) @threads for i in eachindex(Î±list) rmean = mean(begin Î± = Î±list[i] (Qstar, Ï€star, steps, rsum) = estimator(mdp, Î±, 1.0f0; num_episodes = nep, Ïµinit = 0.1f0) mean(rsum) end for _ in 1:nruns) out[i] = rmean end return out end interim_data(estimator) = generate_data(estimator, 100, 50_000) asymp_data(estimator) = generate_data(estimator, 100_000, 10) estimators = [expected_sarsa, sarsa, q_learning] names = ["Expected Sarsa", "Sarsa", "Q-learning"] interim_traces = [scatter(x = Î±list, y = interim_data(estimator), name = "Intermim $name", mode = "lines+markers", line = attr(dash = "dash")) for (estimator, name) in zip(estimators, names)] asymp_traces = [scatter(x = Î±list, y = asymp_data(estimator), name = "Asymptotic $name", mode = "lines+markers", line = attr(dash = "dot")) for (estimator, name) in zip(estimators, names)] p = plot([interim_traces; asymp_traces], Layout(axis_title = "Î±", yaxis_title = "Sum of rewards per episode", yaxis_range = [-150, 0])) serialize(fname, p) return p end¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂÙ$d4e39164-9833-4deb-84ca-22f49a1c33d8„§cell_idÙ$d4e39164-9833-4deb-84ca-22f49a1c33d8¤codeÚËmd""" Reference equations: $\begin{flalign} V(S_t) &\leftarrow V(S_t) + \alpha[R_{t+1} + \gamma V(S_{t+1}) - V(S_t)] \tag{6.2} \\ \delta_t &\doteq R_{t+1} + \gamma V(S_{t+1}) - V(S_t) \tag{6.5} \end{flalign}$ Re-write equation (6.5) using the values known at time t. $V_t$ means the value function estimate at time $t$. $\delta_t \doteq R_{t+1} + \gamma V_t(S_{t+1}) - V_t(S_t)$ Now equation (6.2) becomes $V_{t+1}(S_t) = V_t(S_t) + \alpha \delta_t$ """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$f2115666-86ce-4c80-9eb7-490cc7a7715c„§cell_idÙ$f2115666-86ce-4c80-9eb7-490cc7a7715c¤codeÙ¤md""" With the original value initialization, the error passes through a minimum early on due to the symmetry of the value updates created by the initial value. """¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÃÙ$2155adfa-7a93-4960-950e-1b123da9eea4„§cell_idÙ$2155adfa-7a93-4960-950e-1b123da9eea4¤code¬king_actions¨metadataƒ©show_logsÃ¨disabledÂ®skip_as_scriptÂ«code_foldedÂ«notebook_idÙ$9c6be96e-38f7-11f0-2d30-a71f02755abc«in_temp_dirÂ¨metadata€