Grammatically Rooting Oneself With Parse Trees

 

 

Thinking about all of the abstractions that surround us in the world of technology can sometimes be overwhelming. This is particularly true when you’re trying to wrap your head around a new paradigm or unpack the layers of one or many concepts that you’re struggling to understand.

In the context of learning computer science, there are simply too many abstractions to know, see, or recognize them all  not to mention being able to comprehend all of them!

Abstractions are powerful things when you can see beyond them, and being able to understand how something is abstracted away and why can make you a better programmer. However, by the same token, every abstraction was created for a reason: so that none of us would have to worry about them on a day-to-day basis! We’re not meant to think about abstractions all the time, and for the most part, very few of us actually do. Here’s the thing though  some abstractions are more equal than others. The ones that most engineers are probably concerned with are the ones that involve how they communicate with their computer, and the ways in which their computer actually understands them. Even if we none of us ever have to write a bubble sort algorithm, if we write code, then we have to communicate with our machines.

Well, it’s finally time for us to get to the bottom of these mysteries, and understand the abstractions which power our workflows as programmers.

Parsing for the meaning of parsing

The tree data structure is one that keeps coming up again and again in our computer science adventures. We’ve seen them used to store data of all types, we’ve seen ones that are self-balancing, while others have been optimized for space and handling storage. We’ve even looked at how to manipulate trees by rotating and recoloring them to ensure that they fit a set of rules.

But despite all of these different forms of data structure flora, there is one particular iteration of the tree data structure that we have yet to discover. Even if we knew nothing about computer science, how to balance a tree, or what a tree data structure even works, all programmers interact with one type of tree structure on a daily basis, by virtue of the simple fact that every developer who writes code needs to make sure that their code is understood by their machines.

This data structure is called the parse tree, and it is (one of) the underlying abstractions that allows the code that we write as programmers to become “readable” by our computers.

https://cdn-images-1.medium.com/max/720/1*RJYEISGZl2qUxAaHselQug.webp

Parse tree: a definition.

At its very core, a parse tree is an illustrated, pictorial version of the grammatical structure of a sentence. Parse trees are actually rooted in the field of linguistics, but they’re also used in pedagogy, which is the study of teaching. Parse trees are often used to teach students how to identify the parts of a sentence, and are a common way of introducing grammatical concepts. It’s likely that we’ve each interacted with them from the persepctive of sentence diagramming, which some of us might have learned in elementary school.

A parse tree is a really just a “diagrammed” form of a sentence; that sentence could be written in any language, which means that it could adhere to any set of grammatical rules.

Sentence diagramming involves breaking up a single sentence into its smallest, most distinct parts. If we think about parse trees from the persepctive of diagramming sentences we’ll begin to quickly realize that, depending on the grammar and language of a sentence, a parse tree could really be constructed in a multitude of different ways!

But what exactly is a the computer version of a “sentence”? And how do we go about diagramming it, exactly?

Well, it helps to start with an example of something that we’re already comfortable with, so let’s refresh our memories by diagramming a normal, English sentence.

https://cdn-images-1.medium.com/max/540/1*zQ_bUppUhjPj3JjJ-uQv0w.webp

Simple sentence diagramming with parse trees.

In the illustration shown here, we have a simple sentence: "Vaidehi ate the pie". Since we know that a parse tree is just a diagrammed sentence, we can build a parse tree out of this example sentence. Remember that effectively all we’re trying to do is to determine the different parts of this sentence and break it up into its smallest, most distinct parts.

We can start by breaking up the sentence into two parts: a noun"Vaidehi", and a verb phrase"ate the pie". Since a noun cannot be broken down any further, we’ll leave the word "Vaidehi", as it is. Another way to think about it is the fact that, since we can’t break down a noun any further, there will be no child nodes coming from this word.

But what about the verb phrase, "ate the pie"? Well, that phrase isn’t broken down into its simplest form yet, is it? We can dissect it down even further. For one thing, the word "ate" is a verb, while "the pie" is more of a noun  in fact, it’s a noun phrase to be completely specific. If we split up "ate the pie", we can divide it into a verb and a noun phrase. Since a verb cannot be diagrammed with any additional detail, the word "ate" will become a leaf node in our parse tree.

Alright, so all that’s left now is the noun phrase, "the pie". We can split this phrase up into two distinct pieces: a noun, "pie", and its determiner, which is known as any modifying word of a noun. In this case, the determiner is the word "the".

Once we divide up our noun phrase, we’re done splitting up our sentence! In other words, we’re done diagramming our parse tree. When we look at our parse tree, we’ll notice that our sentence still reads the same way, and we haven’t actually modified it at all. We just took the sentence we were given, and use the rules of English grammar to split it up into its smallest, most distinct parts.

In the case of the English language, the smallest “part” of every sentence is a word; words can be combined into phrases, like noun phrases or verb phrases, which can, in turn, be joined with other phrases to create a sentence epression.

https://cdn-images-1.medium.com/max/540/1*c259fXm_B_Ak2n-9jAtFQg.webp

What does it actually mean to parse something?

However, this is just one example of how one specific sentence, in one specific language, with its own set of grammatical rules would be diagrammed into a parse tree. This same sentence would look very different in a different language, especially if it had to follow its own set of grammatical rules.

Ultimately, the grammar and syntax of a language  including the way the sentences of that language are structured  become the rules that dictate how that language is defined, how we write in it, and how those of us who speak the language will end up understanding and interpreting it.

Interestingly, we knew how to diagram the simple sentence, "Vaidehi ate the pie." because we were already familiar with the grammar of the English language. Imagine if our sentence was missing a noun or a verb altogether? What would happen? Well, we’d likely read the sentence the first time around and quickly realize that it wasn’t even a sentence at all! Rather, we’d read it, and almost immediately see that we were dealing with a sentence fragment, or an incomplete piece of a sentence.

However, the only reason that we’d be able to recognize a sentence fragment is if we knew the rules of the English language  namely, that (nearly) every sentence needs a noun and a verb to be considered valid. The grammar of a language is how we can check to see if a sentence is valid in a language; that process of “checking” for validity is referred to as parsing a sentence.

The process of parsing a sentence to understand it when we read it for the first time involves the same mental steps as diagramming a sentence, and diagramming a sentence involves the same steps as building a parse tree. When we read a sentence for the very first time, we’re doing the work of mentally deconstructing and parsing it.