Blog posts

2023

Spectral Theory

1 minute read

Published: March 29, 2023

In this post, the definition of a spectrum of a function and its spectral radius is provided. Then, some useful properties are stated.

Normed and Inner product spaces

2 minute read

Published: March 15, 2023

In this post, some definitions of the normed and inner product space is given with illustrative examples. Then, the small gain theorem is stated.

Some preliminaries on Nonlinear Control

9 minute read

Published: March 01, 2023

In this post, we review some mathematical preliminaries that are important in understanding the fundamentals of Nonlinear Control theory.

Random Variables

3 minute read

Published: February 16, 2023

In this post, we review some basic definitions to understand the fundamentals of random variables.

Random Processes

5 minute read

Published: February 02, 2023

In this post, we review some preliminaries and axiioms to understand the fundamentals of random processes.

Gradient Descent

4 minute read

Published: January 19, 2023

In this post, we will briefly explain what Gradient Descent (GD) is, how it works, why it is useful, and where it is used.

Q-Learning: Convergence of the algorithm

1 minute read

Published: January 05, 2023

As we discussed in the previous post, in this post, we will prove the convergence of the Q-learning algorithm using some useful norm tricks and contraction theorem.

2022

Q-Learning: Understanding the idea

2 minute read

Published: December 22, 2022

As we have previously discussed in post that TD learning uses the mean estimate method to update the mean estimation of the $Q$ value. We proved in this post the convergence of $Q$ values following the algorithm:

TD learning with Linear Function Approximation: i.i.d sampling

2 minute read

Published: December 08, 2022

Just like the analysis that we did with the tabular TD learning algorithm in this post, here we will prove the convergence of the TD learning with LFA for wisely selected $\epsilon_k$ under the i.i.d. data sample assumption. Formally,

TD learning with Linear Function Approximation: Noise free analysis

3 minute read

Published: November 24, 2022

From the last post we obtained the algorithm for TD learning with LFA:

Understanding TD learning with Linear Function Approximation

2 minute read

Published: November 10, 2022

In the two previous posts( post1 and post2), we proved the convergence of the TD learning algorithm under noise-free and i.i.d sampling assumption, respectively. This guaranteed convergence of the algorithm makes the TD learning very powerful in solving reinforcement learning problems. However, one drawback of this method is in its asynchronisity, which was briefly mentioned in the conclusion of this post. When we update for $Q_{k+1}(s,a)$, $\{(s',a')\in \mathcal{S}\times\mathcal{A} \vert (s',a') \neq (s,a)\}$ are not updated such that $Q_{k+1}(s',a') = Q_{k}(s',a')$.

TD learning: deeper analysis (2)

3 minute read

Published: October 27, 2022

In the previous post we proved the convergence of TD learning under the noise-free assumption. For a brief recap, the TD learning algorithm could be written in terms of $D^\pi$ and resulting noise $n_k$ as

TD learning: deeper analysis (1)

5 minute read

Published: October 13, 2022

In the previous post, we have discussed the basic idea about the TD learning. In this post, we will go deeper into its analysis and prove the convergence of the TD learning under noise-free case, which we will describe shortly.

Temporal Difference (TD) Learning: Understanding the idea

4 minute read

Published: September 29, 2022

Value iteration and policy iteration methods as discussed in the previous post, requires the knowledge of the transition matrix. To be precise, the calculation of the Bellman operator requires the calculation of the expected value of the value function (or a Q-function) of the next possible state (and action) given current state (and action). In other words, the model of the Markov Decision Process (MDP) is assumed to be known in advance. This was a feasible assumption when we have an access to the system or a simulator used to collect the data. However, it is not always the case, and therefore we need some clever method to find an optimal policy under such circumstances.

Value Iteration and Policy Iteration: why it works

5 minute read

Published: September 15, 2022

Value iteration and policy iteration are two algorithmic frameworks for solving reinforcement learning problems. Both frameworks involve iteratively improving the estimates of the value function (or the Q function) in order to find the optimal policy, which is the policy that maximizes the expected return.

Bellman equation and Contraction mapping theorem

5 minute read

Published: September 01, 2022

The contraction mapping theorem is a fundamental result in mathematics that states that if a function is a contraction mapping, then it has a unique fixed point. In the context of reinforcement learning, a fixed point corresponds to an optimal policy, which is a function that maps states to actions and maximizes the expected return.

Minjun Sung

Blog posts

2023

2022