Syntax of First-Order Logic

Muhammad Haris Rao

The set of logical symbols for first-order logic with identity are $\{ \doteq, \bot, \neg, \wedge, \vee, \to, \forall, \exists, (, ) \}$. The set of variables is the set \begin{align*} \texttt{Var} = \left\{ x_n, y_n, z_n \mid n \in \omega \right\} \end{align*} A first-order language is defined as follows:

Definition: A first-order langauge $\mathcal{L}$ is a 4-tuple $\{ \mathcal{C}, \mathcal{R}, \mathcal{F}, n : \mathcal{R} \cup \mathcal{F} \longrightarrow \omega \}$ such that $\mathcal{C}, \mathcal{R}$, $\mathcal{F}$, $\{ \doteq, \bot, \neg, \wedge, \vee, \to, \forall, \exists, (, ) \}$ and $\tt{Var}$ are all pairwise disjoint. The set $\mathcal{C}$ is called the set of constant symbols, $\mathcal{R}$ the relational symbols, $\mathcal{F}$ the functional symbols, and $n$ the parity function.

Given $R \in \mathcal{R}$ and $f \in \mathcal{F}$, we will typically write $n_R = n(R)$ and $n_f = n(f)$.

From now, fix a first-order language $\mathcal{L} = \{ \mathcal{C}, \mathcal{F}, \mathcal{R}, n \}$ and write $$\texttt{Char}(\mathcal{L}) = \mathcal{C} \cup \mathcal{R} \cup \mathcal{F} \cup \{ \doteq, \bot, \neg, \wedge, \vee, \to, (, ) \} \cup \texttt{Var} \cup \{ \forall x, \exists x \mid x \in \texttt{Var} \}$$

Definition: For a given finite ordinal $n \in \omega$, an $\mathcal{L}$-string of length $n$ is a function $$ \varphi : n \longrightarrow \texttt{Char}(\mathcal{L}) $$ The set of $\mathcal{L}$-strings of length $n$ is denoted $\texttt{Str}_n(\mathcal{L})$. The set of all $\mathcal{L}$-strings is $$ \texttt{Str}(\mathcal{L}) = \bigcup_{n \in \omega} \texttt{Str}_n(\mathcal{L}) $$ The length function $\text{Length} : \texttt{Str}(\mathcal{L}) \longrightarrow \omega$ is defined by $\text{Length}(\varphi) = n$ if $\varphi \in \texttt{Str}_n(\mathcal{L})$.

Note that we follow the convention where for a variable $x \in \texttt{Var}$, $\forall x$ and $\exists x$ are single characters in an $\mathcal{L}$-string. If $\varphi, \psi$ are $\mathcal{L}$-strings of length $n, m \in \omega$ respectively, then we define the concatenation $$ \varphi \psi : n + m \longrightarrow \texttt{Char}(\mathcal{L}) $$ by the rule $$ \varphi\psi(k) = \begin{cases} \varphi(k) &\text{ , if $k \in n$} \\ \psi(k - n) &\text{ , if $k \notin n$} \end{cases} $$ So $\varphi\psi \in \texttt{Str}_{n + m}(\mathcal{L})$. For convenience, we will often consider elements of $\texttt{Char}(\mathcal{L})$ as elements of $\texttt{Str}_1(\mathcal{L}) \subseteq \texttt{Str}(\mathcal{L})$. In particular, we can concatenate characters and strings together as well in the obvious way. We state without proof that string concatenation is an associative operation, so that for $\varphi, \psi, \chi \in \texttt{Str}(\mathcal{L})$ we can write $\varphi\psi\chi$ unambiguously.

Definition: Let $\varphi, \psi \in \texttt{Str}(\mathcal{L})$ be $\mathcal{L}$-strings. Then $\psi$ is a prefix of $\varphi$ if there exists $\chi \in \texttt{Str}(\mathcal{L})$ such that $\varphi = \psi \chi$. We say that $\psi$ is a proper prefix of $\varphi$ if $\text{Length}(\psi) \ge 1$ and there is $\chi \in \texttt{Str}(\mathcal{L})$ with $\text{Length}(\chi) \ge 1$ satisfying $\varphi = \psi \chi$.

For $n \in \omega$, we say $n$ is an occurrence of $\psi$ in $\varphi$ if there is $\chi \in \texttt{Str}_n(\mathcal{L})$ such that $\chi \psi$ is a prefix of $\varphi$.

Terms and Formulas

Definition: An $\mathcal{L}$-string $t \in \tt{Str} \left( \mathcal{L} \right)$ is said to be a term if any of the following are true:

There exists a constant $c \in \mathcal{C}$ such that $t = c$
There exists a variable $x \in \tt{Var}$ such that $t = x$
There exists a functional symbol $f \in \mathcal{F}$ and $\mathcal{L}$-terms $t_1, t_2, \cdots, t_{n_f} \in \tt{Term} \left( \mathcal{L} \right)$ such that $t = f t_1 t_2 \cdots t_{n_f}$

The set of $\mathcal{L}$-terms will be denoted $\tt{Term} \left( \mathcal{L} \right)$.

Definition: An $\mathcal{L}$-string $\varphi \in \tt{Str} \left( \mathcal{L} \right)$ is an atomic formula if any one of the following are true

$\varphi = (\bot)$
There exist $\mathcal{L}$-terms $t_1, t_2 \in \tt{Term} \left( \mathcal{L} \right)$ such that $\varphi = (t_1 \doteq t_2)$
There exists a relational symbol $R \in \mathcal{R}$ and $\mathcal{L}$-terms $t_1, t_2, \cdots, t_{n_R} \in \tt{Term} \left( \mathcal{L} \right)$ such that $\varphi = (R t_1 t_2 \cdots t_{n_R})$

Definition: An $\mathcal{L}$-string $\varphi \in \tt{Str} \left( \mathcal{L} \right)$ is an $\mathcal{L}$-formula if any one of the following conditions is true:

$\varphi$ is an atomic formula
There exists an $\mathcal{L}$-formula $\psi$ such that $\varphi = (\neg \psi)$
There exist $\mathcal{L}$-formulas $\psi, \chi$ such that $\varphi = (\psi \wedge \chi)$
There exist $\mathcal{L}$-formulas $\psi, \chi$ such that $\varphi = (\psi \vee \chi)$
There exist $\mathcal{L}$-formulas $\psi, \chi$ such that $\varphi = (\psi \to \chi)$
There is an $\mathcal{L}$-formula $\psi$ and a variable $x \in \tt{Var}$ such that $\varphi = \left( \forall x \, \psi \right)$
There is an $\mathcal{L}$-formula $\psi$ and a variable $x \in \tt{Var}$ such that $\varphi = \left( \exists x \, \psi \right)$

The set of $\mathcal{L}$-formulas will be denoted $\tt{Form} \left( \mathcal{L} \right)$.

Unique Readibility.

We will need the following result later on when we define the semantics for first-order languages

Lemma (Unique Reading Lemma): Let $\varphi \in \tt{Form} \left( \mathcal{L} \right)$ be an $\mathcal{L}$-formula. Exactly one of the following is true:

$\varphi = (\bot)$
There exist unique $\mathcal{L}$-terms $t_1, t_2 \in \tt{Term} \left( \mathcal{L} \right)$ such that $\varphi = (t_1 \doteq t_2)$
There exists a unique relational symbol $R \in \mathcal{R}$ and $\mathcal{L}$-terms $t_1, t_2, \cdots, t_{n_R} \in \tt{Term} \left( \mathcal{L} \right)$ such that $\varphi = (R t_1 t_2 \cdots t_{n_R})$
There is a unique $\mathcal{L}$-formula $\psi \in \tt{Form} \left( \mathcal{L} \right)$ such that $\varphi = \left( \neg \psi \right)$
There exist unique $\mathcal{L}$-formulas $\psi, \chi \in \tt{Form} \left( \mathcal{L} \right)$ such that $\varphi = \left( \psi \to \chi \right)$
There exist unique $\mathcal{L}$-formulas $\psi, \chi \in \tt{Form} \left( \mathcal{L} \right)$ such that $\varphi = \left( \psi \wedge \chi \right)$
There exist unique $\mathcal{L}$-formulas $\psi, \chi \in \tt{Form} \left( \mathcal{L} \right)$ such that $\varphi = \left( \psi \vee \chi \right)$
There exists a unique $\mathcal{L}$-formula $\psi \in \tt{Form} \left( \mathcal{L} \right)$ and $x \in \tt{Var}$ such that $\varphi = \left( \forall x \, \psi \right)$
There exists a unique $\mathcal{L}$-formula $\psi \in \tt{Form} \left( \mathcal{L} \right)$ and $x \in \tt{Var}$ such that $\varphi = \left( \exists x \, \psi \right)$

What this means simply is that for every formula, there is exactly one way to build that formula up from atomic formulas using the operations of conjunction, disjunction, conditioning, negation, and existential and universal quantification. We will require first the following result:

Lemma: An $\mathcal{L}$-formula is an atomic formula if and only if it has exactly one left and one right parenthesis. Further, the numbers of left and right parenthesis in any $\mathcal{L}$-formula are equal.

Proof. Since $\mathcal{L}$-terms are strings which have no parentheses, it is clear from the definition of atomic formula that atomic formulas have exactly one left and one right parenthesis. Conversely, if $\varphi \in \tt{Form} \left( \mathcal{L} \right)$ is an $\mathcal{L}$-formula which is not atomic, at least one of the last 6 of the 7 conditions defining an $\mathcal{L}$-formula. In each of these cases, there is a substring $\psi$ of $\varphi$ which is also a formula, and the first character of $\psi$ is at a different position than that of $\varphi$. We also have from the definition of an $\mathcal{L}$-formula that each of $\varphi$ and $\psi$ have a left parenthesis as their first character. Then $\varphi$ in fact has at least two left parenthses, one as its first character, and the other as the first character of its substring $\psi$. Thus, for any non-atomic formula $\varphi \in \tt{Form} \left( \mathcal{L} \right)$, there are at least two left parentheses. In particular, non-atomic formulas do not have exactly one left and one right parenthesis. This proves converse of the first claim.

Now to show that the number of left and right parentheses of any $\mathcal{L}$-formula are equal. We induct on the number of left and right parentheses. For the base case, let $\varphi \in \tt{Form} \left( \mathcal{L} \right)$ have exactly one parenthesis. Then since non-atomic formulas have at least two left parenthesis, $\varphi$ must be atomic. We have just shown that atomic formulas have equal numbers of left and right parentheses, namely one, so this completes the base case.

Now for the induction step. Suppose $n \in \mathbb{Z}_{> 0}$ and if $\varphi \in \tt{Form} \left( \mathcal{L} \right)$ is such that $\lambda(\varphi) \le n$, then $\lambda(\varphi) = \rho(\varphi)$. If $\lambda(\varphi) = n + 1$, then in particular $\lambda(\varphi) > 0$ so $\varphi$ is not atomic. So at least one of the last 6 of the 7 conditions defining $\mathcal{L}$-formula hold. We check each now.

If there is $\psi \in \tt{Form}\left( \mathcal{L} \right)$ such that $\varphi = ( \neg \psi )$ then $\lambda(\psi) = \lambda(\varphi) - 1 = n$. So the induction hypothesis holds on $\psi$ and $$ \lambda \left( \varphi \right) = \lambda \left( \left( \neg \psi \right) \right) = 1 + \lambda \left( \psi \right) = 1 + \rho \left( \psi \right) = \rho \left( \left( \neg \psi \right) \right) = \rho \left( \varphi \right) $$

If there is $* \in \{ \wedge, \vee, \to \}$ and $\psi, \chi \in \tt{Form} \left( \mathcal{L} \right)$ such that $\varphi = (\psi * \chi)$ then we have $\lambda(\psi), \lambda(\chi) \le n$. So the induction hypothesis holds on $\psi, \chi$ and $$ \lambda \left( \varphi \right) = \lambda \left( \left( \psi * \chi \right) \right) = 1 + \lambda \left( \psi \right) + \lambda \left( \chi \right) = 1 + \rho \left( \psi \right) + \rho \left( \chi \right) = \rho \left( \left( \psi * \chi \right) \right) = \rho \left( \varphi \right) $$

If there is $* \in \{ \forall, \exists \}$, $x \in \tt{Var}$ and and $\psi \in \tt{Form} \left( \mathcal{L} \right)$ such that $\varphi = (*x \, \psi)$ then we have $\lambda(\psi) = n$. So the induction hypothesis holds on $\psi$ and $$ \lambda \left( \varphi \right) = \lambda \left( \left( * x \, \psi \right) \right) = 1 + \lambda \left( \psi \right) = 1 + \rho \left( \psi \right) = \rho \left( \left( * x \, \psi \right) \right) = \rho \left( \varphi \right) $$

This completes all the cases defining a non-atomic formula, so we have shown that if $\lambda(\varphi) = n + 1$ then $\rho(\varphi) = n + 1$ as well. This completes the induction step, so we conclude that if $\varphi \in \tt{Form} \left( \mathcal{L} \right)$ then $\lambda(\varphi) = \rho(\varphi)$ as was to be shown. $\blacksquare$

\vspace

Given $\mathcal{L}$-strings $\alpha, \beta$ with $\alpha$ non-empty, we say $\alpha$ is a proper prefix of $\beta$ if there is a non-empty string $\gamma$ such that $\beta$ is the string $\alpha \gamma$.

Lemma: Every proper prefix of a formula has strictly less left parentheses than right parentheses. Consequently, the number of right parentheses of a prefix of a formula is at most the number of left parentheses.

Proof. We induct on the number of left parentheses. If $\lambda \in \tt{Form} \left( \mathcal{L} \right)$ is such that $\lambda(\varphi) = 1$ then $\varphi$ is atomic. Any proper prefix of $\varphi$ begins with a left parenthesis, and doesn't contain the last character of $\varphi$, that is the right parenthesis. So any such proper prefix has exactly one left parenthesis and no right parentheses, which is the desired result. This proves the base case when a formula has a single left parenthesis.

Now for the induction step. Suppose $n \in \mathbb{Z}_{> 0}$ and that if $\varphi \in \tt{Form} \left( \mathcal{L} \right)$ is such that $\lambda(\varphi) \le n$, then any proper prefix of $\varphi$ has strictly less right parentheses than left parentheses. Now suppose $\lambda(\varphi) = n + 1$ and $\alpha \in \tt{Str} \left( \mathcal{L} \right)$ is a proper prefix of $\varphi$. Since $\lambda(\varphi) > 1$, it follows that $\varphi$ is not atomic so we check each of the last 6 cases definining an $\mathcal{L}$-formula.

If there exists $\psi \in \tt{Form} \left( \mathcal{L} \right)$ such that $\varphi = (\neg \psi )$, then $\alpha = ($ or $\alpha = (\neg \beta$ for some prefix $\beta \in \tt{Str} \left( \mathcal{L} \right)$ of $\psi$. In the first case, $\lambda(\alpha) = 1 > 0 = \rho(\alpha)$. In the second case, we have $\lambda(\psi) = n$ so the induction hypothesis applies and tells us that $\lambda(\beta) \ge \rho(\beta)$. Then $$ \lambda \left( \alpha \right) = \lambda \left( (\neg \beta \right) = 1 + \lambda \left( \beta \right) \ge 1 + \rho \left( \beta \right) > \rho \left( \beta \right) = \rho \left( (\neg \alpha \right) = \rho \left( \alpha \right) $$

If there exists $* \in \{ \wedge, \vee, \to \}$ and $\psi, \chi \in \tt{Form} \left( \mathcal{L} \right)$ such that $\varphi = (\psi * \chi)$, then $\alpha = ( \beta_1$ or $\alpha = (\beta_1 * \beta_2$ for some prefices $\beta_1, \beta_2 \in \tt{Str} \left( \mathcal{L} \right)$ of $\psi, \chi$ respectively. Since $\lambda(\psi), \lambda(\chi) \le n$ we can apply the induction hypothesis on $\psi, \chi$. In the first case, $$ \lambda \left( \alpha \right) = \lambda \left( (\beta_1 \right) = 1 + \lambda \left( \beta_1 \right) \ge 1 + \rho \left( \beta_1 \right) > \rho \left( \beta_1 \right) = \rho \left( (\beta_1 \right) = \rho \left( \alpha \right) $$ and in the second case, $$ \lambda \left( \alpha \right) = \lambda \left( (\beta_1 * \beta_2 \right) = 1 + \lambda \left( \beta_1 \right) + \lambda \left( \beta_2 \right) \ge 1 + \rho \left( \beta_1 \right) + \rho \left( \beta_2 \right) > \rho \left( \beta_1 \right) + \rho \left( \beta_2 \right) = \rho \left( (\beta_1 * \beta_2 \right) = \rho \left( \alpha \right) $$

If there exists $* \in \{ \forall, \exists \}$, $x \in \tt{Var}$ and $\psi \in \tt{Form} \left( \mathcal{L} \right)$ such that $\varphi = ( *x \, \psi )$, then $\alpha = ($ or $\alpha = (*$ or $\alpha = (* x \, \beta$ for some prefix $\beta \in \tt{Str} \left( \mathcal{L} \right)$ of $\psi$. In the first two cases, $\lambda(\alpha) = 1 > 0 = \rho(\alpha)$. In the second case, we can apply the induction hypothesis on $\psi$ since $\lambda(\psi) \le n$ to get $$ \lambda \left( \alpha \right) = \lambda \left( ( *x \, \beta \right) = 1 + \lambda \left( \beta \right) \ge 1 + \rho \left( \beta \right) > \rho \left (\beta \right) = \rho \left( *x \, \beta \right) = \rho \left( \alpha \right) $$

These three cases together show that given the induction hypothesis, if $\varphi \in \tt{Form} \left( \mathcal{L} \right)$ is such that $\lambda(\varphi) = n + 1$ and $\alpha \in \tt{Str} \left( \mathcal{L} \right)$ is a proper prefix of $\varphi$, then $\lambda(\alpha) > \rho(\alpha)$. This completes the induction step.

We conclude that if $\varphi \in \tt{Form} \left( \mathcal{L} \right)$ and $\alpha \in \tt{Str} \left( \mathcal{L} \right)$ is a proper prefix of $\varphi$, then $\lambda(\alpha) > \rho(\alpha)$ as was to be shown. $\blacksquare$

The importance of the preceding lemma is the following

Corollary: No proper prefix of a formula is a formula.

We will need an analogous result for terms as well. A prefix of a term is defined in the obvious way.

Lemma: No proper prefix of a term is a term.

Proof. Terms of length 1 have no proper prefices, so this case is trivially true. We proceed by induction on this base case. Supppose $n \in \mathbb{Z}_{> 0}$ and that if $t \in \tt{Term} \left( \mathcal{L} \right)$ is of length at most $n$ then no proper prefix of $t$ is a term. If $t \in \tt{Term} \left( \mathcal{L} \right)$ is of length $n + 1$ then in particular, it has length at least 2. Of the three conditions defining an $\mathcal{L}$-term, the first two define terms of length 1. So $t$ satisfies the third condition, that is, there exists $f \in \mathcal{F}$ and $t_1, t_2, \cdots, t_{n_f} \in \tt{Term} \left( \mathcal{L} \right)$ such that $t = f t_1 t_2 \cdots t_{n_f}$. Now to show that no proper prefix of $t$ is a term.

If $\alpha$ is a proper prefix of $t$ then there exists $k \in \{ 1, 2, \cdots, n_f \}$ and prefix $\gamma$ of $t_k$ with $\gamma \not= t_k$ such that $\alpha = f t_1 t_2 \cdots t_{k - 1} \gamma$. Suppose for contradiction that $\alpha$ really is a term. Since the first character of $\alpha$ is a funciton symbol, the third condition defining an $\mathcal{L}$-term must hold. So there exist $t_1^*, t_2^*, \cdots, t_{n_f}^*$ such that $\alpha = f t_1^* t_2^* \cdots t_{n_f}^*$.

It follows then that $f t_1 t_2 \cdots t_{k - 1} \gamma = f t_1^* t_2^* \cdots t_{n_f}^*$. Let $\ell \in \{ 1, 2, \cdots, k - 1 \}$ be the smallest index such that $t_\ell \not= t_\ell^*$ if such an index exists. If there is such an $\ell$, then we have by minimality that $f t_1 t_2 \cdots t_{\ell - 1} = f t_1^* t_2^* \cdots t_{\ell - 1}^*$ but $f t_1 t_2 \cdots t_{\ell - 1} t_\ell \not= f t_1^* t_2^* \cdots t_{\ell - 1}^* t_\ell^*$. This can only mean that one of $t_\ell, t_\ell^*$ is a proper prefix of the other since we still have the string equality $f t_1 t_2 \cdots t_\ell \cdots t_{k - 1} \gamma = f t_1^* t_2^* \cdots t_\ell^* \cdots t_{n_f}^*$. But this would contradict the induction hypothesis since $t_\ell, t_\ell^*$ are $\mathcal{L}$-terms with length at most $n$.

Then no such $\ell$ can exist meaning that $t_\ell = t_\ell^*$ for all $\ell \in \{ 1, 2, \cdots, k - 1 \}$. Then because $f t_1 t_2 \cdots t_{k - 1} \gamma = f t_1^* t_2^* \cdots t_{n_f}^*$, we have then $\gamma = t_k^* t_{k + 1}^* \cdots t_{n_f}^*$. So $\gamma$ is non-empty. Recall also that $\gamma$ was defined as a prefix of $t_k$ of smaller length, so we have now that $\gamma = t_k^* t_{k + 1}^* \cdots t_{n_f}^*$ is a proper prefix of $t_k$. But then $t_k^*$ is a proper prefix of $t_k$, which contradicts the induction hypothesis since $t_k^*, t_k$ are $\mathcal{L}$-terms.

We are forced to conclude then that no given proper prefix $\alpha$ of $t$ can be a term. This compeltes the induction step, so we have the desired result that proper prefices of $\mathcal{L}$-terms cannot be $\mathcal{L}$-terms. $\blacksquare$

\vspace

We are now able to prove the unique reading lemma for atomic formulas. That is,

Lemma: Let $\varphi$ be an atomic formula in the language $\mathcal{L}$. Then exactly one of the following is true:

$\varphi = (\bot)$
There exist unique $\mathcal{L}$-terms $t_1, t_2 \in \tt{Term} \left( \mathcal{L} \right)$ such that $\varphi = (t_1 \doteq t_2)$
There exists a unique relational symbol $R \in \mathcal{R}$ and $\mathcal{L}$-terms $t_1, t_2, \cdots, t_{n_R} \in \tt{Term} \left( \mathcal{L} \right)$ such that $\varphi = (R t_1 t_2 \cdots t_{n_R})$

Proof. It is evident by looking at the symbol in the second position of an atomic formula that exactly one of the three conditions defining an atomic formula can and must hold. We will prove the uniqueness statements above.

Suppose $\varphi \in \tt{Form} \left( \mathcal{L} \right)$ is an atomic formula of the second form. That is, there exist $t_1, t_2 \in \tt{Term} \left( \mathcal{L} \right)$ such that $\varphi = (t_1 \doteq t_2)$. Suppose also that $t_1^*, t_2^* \in \tt{Term} \left( \mathcal{L} \right)$ are such that $\varphi = ( t_1^* \doteq t_2^* )$. Then $(t_1 \doteq t_2) = (t_1^* \doteq t_2^*)$. If $t_1 \not= t_1^*$ then clearly, one must be a proper prefix of the other because of the string identity $(t_1 \doteq t_2) = (t_1^* \doteq t_2^*)$. By the previous result, this is not possible so rather $t_1 = t_1^*$. Then, the equivalences $(t_1 \doteq t_2) = (t_1^* \doteq t_2^*)$ and $t_1 = t_1^*$ imply that $t_2 = t_2^*$. This proves that the representation $\varphi = (t_1 \doteq t_2)$ is unique.

Now suppose $\varphi$ is of the third form so $\varphi = \left( R t_1 t_2 \cdots t_{n_R} \right)$ for some $R \in \mathcal{R}$ and $t_1, t_2, \cdots, t_{n_R} \in \tt{Term} \left( \mathcal{L} \right)$. Suppose also that $\varphi = \left( S t_1^* t_2^* \cdots t_{n_S}^* \right)$ for some $S \in \mathcal{R}$ and $t_1^*, t_2^*, \cdots, t_{n_S}^* \in \tt{Term} \left( \mathcal{L} \right)$. Then $\left( S t_1^* t_2^* \cdots t_{n_S}^* \right) = \left( R t_1 t_2 \cdots t_{n_R} \right)$. By extracting the second symbol, we have $R = S$ and so also $n_R = n_S$. Consequently, $t_1 t_2 \cdots t_{n_R} = t_1^* t_2^* \cdots t_{n_R}^*$. Let $\ell \in \{ 1, 2, \cdots, n_R \}$ be the minimal index such that $t_\ell \not= t_\ell^*$ if such an index exists. Then we have $t_1 t_2 \cdots t_{\ell - 1} = t_1^* t_2^* \cdots t_{\ell - 1}^*$ but $t_1 t_2 \cdots t_{\ell - 1} t_\ell \not= t_1^* t_2^* \cdots t_{\ell - 1}^* t_\ell^*$. By the string equality $t_1 t_2 \cdots t_{n_R} = t_1^* t_2^* \cdots t_{n_R}^*$, it will follow that one of $t_\ell, t_\ell^*$ is a proper prefix of the other, which would be absurd since proper prefices of terms cannot be terms. Thus, no $\ell \in \{ 1, 2, \cdots, n_R \}$ such that $t_\ell \not= t_\ell^*$ can exist, so for each $\ell \in \{ 1, 2, \cdots, n_R \}$ we have $t_\ell = t_\ell^*$. This proves the representation $\varphi = ( R t_1 t_2 \cdots t_{n_R} )$ is uniquely determined.

This completes the proof. $\blacksquare$

\vspace

Finally, we have the full unique reading lemma.

Proof of Unique Readability. Let $\varphi \in \tt{Form}$ and check the first character after the opening parenthesis. If it is $\neg$, then $\varphi$ can only be the negation of a formula, if it is $\exists$, then $\varphi$ is an existentially quantified formula, if it is $\forall$, it is a universally quantified formula, if it is a left parenthesis, it is two formulas connected by $\wedge$, $\vee$ or $\to$, and otherwise it is atomic.

If the first character after the opening parenthesis is not a left parenthesis, then $\varphi$ is uniquely determined as either an atomic formula, or an existenally quantified formula, or a universally quantified formula. Suppose $\varphi$ is existentially quantified. By identifying the third characters of these two representations, the variable over which we are quantifying is uniquely determined, so suppose $\varphi$ is identical to both $(\exists x \, \psi)$ and $(\exists x \, \chi )$ for some $\psi, \chi \in \tt{Form}$. Removing $( \exists x$ from the front and $)$ from the end yields that $\psi$ and $\chi$ are identical. Hence, the precise makeup of $\varphi$ is uniquely determined in this case. The case where $\varphi$ is universally quantified is the same, and the case where it is negated is similar as well.

Now we suppose that the second character of $\varphi$ is a left parenthesis. Then, $\varphi$ is $(\psi_1 *_1 \chi_1)$ for some $\psi_1, \chi_1 \in \tt{Form}$ and $*_1$ is one of $\wedge$, $\vee$, or $\to$. Suppose $\varphi$ can also be written as $( \psi_, *_2 \chi_2 )$ for some $\psi_2, \chi_2 \in \tt{Form}$ and $*_2$ one of the binary connectives. Since $\psi_1$ and $\psi_2$ begin at the same position as substrings of $\varphi$, if they are not identical, then one is a proper prefix of the other. But we know that this is impossible since proper prefixes of formulas are not formulas, so they are instead identical. This implies that $*_1$ and $*_2$ are the same as well, and then also $\chi_1$ and $\chi_2$. Hence, the constituents of $\varphi$ are also uniquely determined.

This completes all the cases.

Free and Bound Variables and Substitution

The scope of a quantifier is defined as follows:

Definition: Let $\varphi \in \texttt{Form}(\mathcal{L})$ be an $\mathcal{L}$-formula, and $n \in \omega$ an occurrence of either $\forall x$ or $\exists x$ in $\varphi$ for some $x \in \texttt{Var}$. Then the scope of this occurrence is an $\mathcal{L}$-formula $\psi \in \texttt{Form}(\mathcal{L})$ if $n + 1$ is an occurrence of $\psi$ in $\varphi$.

It is a consequence of the fact that proper prefices of $\mathcal{L}$-formulas are not $\mathcal{L}$-formulas that if the scope exists of a quantifier exists, it is unqiuely defined. It is also true that the scope will always exist, but it is not important for us.

Finally, we have the definition of a variable bound by a quantifier.

Definition: Let $\varphi \in \texttt{Form}(\mathcal{L})$ be an $\mathcal{L}$-formula, $n \in \omega$ an occurrence of some variable $x \in \texttt{Var}$ in $\varphi$. We say that this occurrence is bound if there exists an occurrence $m \in \omega$ of either $\forall x$ or $\exists x$ in $\varphi$ whose scope $\psi \in \texttt{Form}(\mathcal{L})$ satisfies $m < n \le m + \text{Length}(\psi)$. Otherwise, the occurrence is free.

We say that an occurrence $n \in \omega$ of $x$ in $\varphi$ is bound by an occurrence $m \in \omega$ of $\forall x$ or $\exists x$ in $\varphi$ if $n - (m + 1)$ is a free occurrence of $x$ in the scope of $m$.

The variables of an $\mathcal{L}$-string $\varphi \in \texttt{Str}(\mathcal{L})$ are defined as $$ \text{var}(\varphi) = \left\{ x \in \texttt{Var} \mid \text{$x$ occurs in $\varphi$} \right\} $$ The free variables of an $\mathcal{L}$-formula $\varphi \in \texttt{Form}(\mathcal{L})$ are defined as $$ \text{fv}(\varphi) = \left\{ x \in \texttt{Var} \mid \text{$x$ occurs freely in $\varphi$} \right\} $$ We state without proof the following recursive method of computing the set of free variables:

Definition: Let $\varphi \in \texttt{Form}(\mathcal{L})$ be an $\mathcal{L}$-formula. Then,

If $\varphi = (\bot)$, then $\text{fv}(\varphi) = \emptyset$.
If there exist $\mathcal{L}$-terms $t_1, t_2 \in \texttt{Term}(\mathcal{L})$ such that $\varphi = (t_1 \doteq t_2)$, then $\text{fv}(\varphi) = \text{var}(t_1) \cup \text{var}(t_2)$.
If there is a relational symbol $R \in \mathcal{R}$ and $\mathcal{L}$-terms $t_1, t_2, \cdots, t_{n_R} \in \texttt{Term}(\mathcal{L})$ such that $\varphi = (R t_1 t_2 \cdots t_{n_R})$, then $\text{fv}(\varphi) = \bigcup_{i = 1}^{n_R} \text{var}(t_i)$.
If there exists an $\mathcal{L}$-formula $\psi \in \texttt{Form}(\mathcal{L})$ such that $\varphi = (\neg \psi)$, then $\text{fv}(\varphi) = \text{fv}(\psi)$.
If there exist $\mathcal{L}$-formulas $\psi, \chi \in \texttt{Form}(\mathcal{L})$ such that $\varphi = (\psi \wedge \chi)$, then $\text{fv}(\varphi) = \text{fv}(\psi) \cup \text{fv}(\chi)$.
If there exist $\mathcal{L}$-formulas $\psi, \chi \in \texttt{Form}(\mathcal{L})$ such that $\varphi = (\psi \vee \chi)$, then $\text{fv}(\varphi) = \text{fv}(\psi) \cup \text{fv}(\chi)$.
If there exist $\mathcal{L}$-formulas $\psi, \chi \in \texttt{Form}(\mathcal{L})$ such that $\varphi = (\psi \to \chi)$, then $\text{fv}(\varphi) = \text{fv}(\psi) \cup \text{fv}(\chi)$.
If there is a $x \in \texttt{Var}$ and $\psi \in \texttt{Form}(\mathcal{L})$ such that $\varphi = (\forall x \, \psi)$, then $\text{fv}(\varphi) = \text{fv}(\psi)\backslash \{ x \}$.
If there is a $x \in \texttt{Var}$ and $\psi \in \texttt{Form}(\mathcal{L})$ such that $\varphi = (\exists x \, \psi)$, then $\text{fv}(\varphi) = \text{fv}(\psi)\backslash \{ x \}$.

Substitution is defined as follows:

Definition: If $s, t \in \tt{Term}(\mathcal{L})$ are $\mathcal{L}$-terms, and $x \in \tt{Var}$ a variable, $s[x/t]$ is defined as

If there is a constant symbol $c \in \mathcal{C}$ such that $s = c$ then $s[x/t] = s$
If $s = x$ then $s[x/t] = t$
If there is variable $y \in \tt{Var}$ such that $x \ne y$ and $t = y$ then $s[x/t] = s$
If there is a function symbol $f \in \mathcal{F}$ and $\mathcal{L}$-terms $t_1, t_2, \cdots, t_{n_f} \in \tt{Term}(\mathcal{L})$ such that $s = f t_1 t_2 \cdots t_{n_f}$ then $s[x/t] = f t_1^* t_2^* \cdots t_{n_f}^*$ where $t_i^* = t_i[x/t]$ for $i \in \{ 1, 2, \cdots, n_f \}$

If $\varphi \in \tt{Form}(\mathcal{L})$ is an $\mathcal{L}$-formula, $x \in \tt{Var}$ a variable, and $t \in \tt{Term}(\mathcal{L})$ an $\mathcal{L}$-term, then

If $\varphi = (\bot)$ then $\varphi[x/t] = (\bot)$
If there are $\mathcal{L}$-terms $t_1, t_2 \in \tt{Term}(\mathcal{L})$ such that $\varphi = (t_1 \doteq t_2)$ then $\varphi[x/t] = (t_1^* \doteq t_2^*)$ where $t_i^* \doteq t_i[x/t]$ for $i \in \{ 1, 2 \}$
If there is a relational symbol $R \in \mathcal{R}$ and $\mathcal{L}$-terms $t_1, t_2, \cdots, t_{n_R} \in \tt{Term}(\mathcal{L})$ such that $\varphi = (R t_1 t_2 \cdots t_{n_R})$ then $\varphi[x/t] = (R t_1^* t_2^* \cdots t_{n_R}^*)$ where $t_i^* = t_i[x/t]$ for $i \in \{ 1, 2, \cdots, n_R \}$
If there is an $\mathcal{L}$-formula $\psi \in \tt{Form}(\mathcal{L})$ such that $\varphi = (\neg \psi)$ then $\varphi[x/t] = (\neg \psi[x/t])$
If there are $\mathcal{L}$-formulas $\psi, \chi \in \tt{Form}(\mathcal{L})$ such that $\varphi = (\psi \wedge \chi)$ then $\varphi[x/t] = (\psi[x/t] \wedge \chi[x/t])$
If there are $\mathcal{L}$-formulas $\psi, \chi \in \tt{Form}(\mathcal{L})$ such that $\varphi = (\psi \vee \chi)$ then $\varphi[x/t] = (\psi[x/t] \vee \chi[x/t])$
If there are $\mathcal{L}$-formulas $\psi, \chi \in \tt{Form}(\mathcal{L})$ such that $\varphi = (\psi \to \chi)$ then $\varphi[x/t] = (\psi[x/t] \to \chi[x/t])$
If there is an $\mathcal{L}$-formula $\psi \in \tt{Form}(\mathcal{L})$ such that $\varphi = (\forall x \, \psi)$ then $\varphi[x/t] = (\forall x \, \psi)$
If there is an $\mathcal{L}$-formula $\psi \in \tt{Form}(\mathcal{L})$ and variable $y \in \tt{Var}$ such that $y \not= x$ and $\varphi = (\forall y \, \psi)$ then $\varphi[x/t] = (\forall y \, \psi[x/t])$
If there is an $\mathcal{L}$-formula $\psi \in \tt{Form}(\mathcal{L})$ such that $\varphi = (\exists x \, \psi)$ then $\varphi[x/t] = (\exists x \, \psi)$
If there is an $\mathcal{L}$-formula $\psi \in \tt{Form}(\mathcal{L})$ and variable $y \in \tt{Var}$ such that $y \not= x$ and $\varphi = (\exists y \, \psi)$ then $\varphi[x/t] = (\exists y \, \psi[x/t])$

Finally, one more definition.

Definition: If $t \in \tt{Term}(\mathcal{L})$ is an $\mathcal{L}$-term, $x \in \tt{Var}$ a variable, and $\varphi \in \tt{Form}(\mathcal{L})$ an $\mathcal{L}$-formula, then $t$ is free for $x$ in $\varphi$ if whenever $n \in \omega$ is a free occurrence of $x$ in $\varphi$, it is a free occurrence of $y$ in $\varphi[x/y]$ for every $y \in \texttt{Var}$ occuring in $t$.