What is Word Order?
Word order is a term from linguistics and it is well-defined for natural languages. In these languages, there are verbs and nouns (as well as other parts of speech). Nouns can act as subjects or objects. The word order is the standard sequence in which a verb, a subject, and an object would go in a sentence. The order differs from language to language and some languages allow for some flexibility, but it appears that even where flexibility is allowed, there is normally a preferred order of words.
Let’s consider a few examples.
SVO (subject-verb-object) order is the one used in English, Thai, French, and so on. For instance, in a sentence “The ship carries cargo”, “the ship” is a subject, “carries” is a verb, and “cargo” is an object.
SOV (subject-object-verb) is used in Latin, Turkish, and Japanese. I am not an expert in Latin, but something like “Homo verum dicit” may be an example. Here, “Homo” (a man/person) is a subject, “verum” (truth) is an object, and “dicit” (speaks) is the verb.
VSO (verb-subject-object) is used in Irish, Filipino, and Welsh. I am even less of an expert in Irish (it is in fact incomprehensible to me), but I hope the following is correct. In a sentence “Itheann bó féar”, “Itheann” (eats) is a verb, “bó” (a cow) is a subject, and “féar” (a grass) is an object.
That makes sense, right?
The standard makes it pretty clear what is going on. For instance, is reassuring that a cow eats grass, but it is far less so if it is the grass that eats a cow. Precision is vital.
Word Order in Programming Languages
What is Word Order in Programming Languages?
Is there such a thing at all?
For a start, there are no subjects, verbs and objects in programming languages. At least, not in a normal sense. Then, there are significant differences between natural and programming languages. The former ones are relatively ambiguous while the latter are relatively precise. The former ones have a fairly fixed set of works and the context matters immensely, while the latter ones allow us to define new “words” on the fly. By large, there is no “subject” in programming languages: the only true subject is whoever writes the program, the rest is “objects”, “commands”, “definitions”, and so on. Finally, a code is not a conversation. Natural languages are meant for communication between people, while programming languages are meant for creating instructions.
Nevertheless, there are similarities between these two classes of languages. So let’s draw upon that.
While there is a great multitude of programming languages, let’s consider only the imperative ones. The ideas can probably be generalised a bit further, but that is a story for another day.
context argument code
Instead of verbs-subjects-objects, let’s talk about code (the closest analogue to verbs), arguments (loosely similar to objects, and occasionally subjects), and contexts (these are present only in some languages and normally correspond to objects or classes).
It is not hard to see that the most common “word order” would be XCA (conteXt-code-arguments).
In Python:
o.append(item)
In Java:
o.append(item);
Here, in the context “o” we “append” (code) the “item” (argument).
It is not hard to see that XCA word order is the standard in object-oriented languages.
Another common “word order” is CXA (code-conteXt-argument) or, CA (code-arguments). The distinction is almost non-existent, as in the languages which have such word order there is little practical difference between the two.
For example,
(write-out-csv table file)
could mean “call method ‘write-out-csv’ on object ‘table’ and pass ‘file’ in as an argument”. However, it may equally mean “call a function ‘write-out-csv’ and pass both ‘table’ and ‘file’ to it as arguments”. The exact meaning depends on how “write-out-csv” was defined. The important thing here is that we say the action first and then say which data should this action work with.
Another example is
new Circle(centre, radius);
Here, the instruction is “[create] new [object of class]”, “Circle” is the Java class playing the role of a context, while “centre” and “radius” are arguments.
The Matter of Smoothness
Before we move any further, let’s discuss the idea of “smoothness”.
As an example, let’s consider planning.
The plan A is:
- Wake up.
- Have breakfast.
- Dress up.
- Travel to the office
- Do work until lunch.
- Have lunch.
- Do work until the end of working hours.
- Travel back home.
- Have dinner.
- Read a book while listening to music.
- Prepare for sleep.
- Go to bed.
The plan B is:
- Study in a school.
- Study in a university.
- Work in a couple of companies, and gain experience.
- Create and build your own business.
- Sell the business years later and retire early.
The plan C is:
- Study in a school for several years.
- Have an ice cream cone.
- Study in a university.
- Visit a foreign country for a couple of days.
- Find employment and work for five years.
- Buy a pair of socks.
I suppose you do not need to have a PhD degree to spot that while plan A and plan B are quite different, plan C is the odd one out, and by a huge margin. It is “not smooth”. Steps are clearly not of the same order of magnitude, whether you consider the time they would take or their importance. Plans A and B, on the other hand, are quite “smooth”.
The same idea applies in many contexts, and we will draw upon it a bit later.
Does the Word Order Matter? Why?
The human brain has evolved to deal with natural languages. There are structures in it, which support the language constructs in various ways and also have their own quirks and limitations. For programming languages, the brain most likely uses the same “hardware” so to speak. As a result, similar cognition processes are at play.
With natural languages, the specific word order does not seem to be more than a matter of convention. If you can say something with one word order, you can say the same thing with another one. But I do believe that in the case of programming languages, the word order matters more, perhaps much more. Here is why.
In a programming language, ideas are defined and given names all the time. Each identifier is a reference to an idea, and it is important to be clear as to which idea it refers to. That is why there are various mechanisms for defining the scope of names: modules, namespaces, packages, functions, blocks, you name it. At any place of code, the meaning of an identifier is defined by the current scope and what is defined within it.
Now think about the process of typing the code. Let’s say it is a normal language which is meant to be read left to right, not something like APL. At every point in time, there is a limited number of identifiers which could make sense. Just like you would not write “reads” after “My dog”, you would not write a call to a method for an object if that object does not have that specific method. The code that has been typed so far defines the context which naturally acts as a filter and defines what may be expected next.
My view is that the preferred approach is where at every step the set of valid choices is reasonably small, so that at no point the choice would be overwhelming. That feature also reduces the probability of a surprise and improves the efficiency of auto-completion.
Another thing, which I think is beneficial, is that the number of choices on each step should be average: neither small nor overwhelmingly large. In other words, as you progress with typing the code, the changes in the choices should be smooth.
How does it work with different word orders?
In the case of XCA, the first thing to be written is the context. The context is normally an object or a class. The number of objects is naturally limited by the current scope. It could be a few imported definitions like “standard output”, the arguments of the method (if this is the code of a method), possibly “self”/”this”, and maybe fields of the object. Normally, that would be a few tens of options, not more. Yes, you can also type a literal, like a string literal, but that does not change the overall picture. The number of classes (or typeclasses, traits, you name it) is also fairly limited, usually by the fact that they have to be implicitly imported and the list of imports is normally small (as it should be).
Now that the “context” is typed, the set of methods is naturally limited to the methods defined for the object/class. Again, normally it is not hundreds, it is tens. Once the method is typed, we come to the method’s arguments. Here, the choice is not wider than that for the object. It may be narrower, if the language is statically typed: invalid options are filtered out based on the types and the signature of the method.
The situation is quite different in the case of CXA/CA word orders. We need to start with the method or function. How many of them there are? Well, there are standard ones like arithmetic operations, so let’s say ten-twenty, plus all the applicable methods on all objects which are in the scope, plus all the methods which may be applicable for the objects which may be constructed or obtained in some other way, plus possibly all function calls which are not associated with any objects whatsoever. To say this is a lot would be an understatement.
Let’s say we have crossed that Rubicon and the method/function name has been typed. Hurray! What’s next? Then, you would normally type the object on which the method is to be called. There may be some candidates in the current scope, and the number of them is likely going to be very small. Then you would follow up by typing arguments (if there are any to be typed), and on each step, you will have a fairly small set of options.
Therefore, it appears that with the XCA order, on each step of typing the set of applicable options is average, while with the CXA/CA order, the set of options is initially large, and then becomes small. This lack of smoothness in CXA/CA is hardly a nice feature.
There is another difference and it has to do with the scoping. The XCA word order is used in languages where methods are substantially associated with the objects (or their classes). In those languages, import definitions pull mainly objects and classes into the scope: methods just follow the flow.
In languages with the CXA/CA word orders, methods (normally called functions) are coupled with objects and classes too loosely, and import statements have pull methods on their own. I am sure that sometimes it works fine, but it is also easy to see how this can get messy and awkward.
Final Words
It appears that it is possible to talk about word order in programming languages, as long as certain generalisations are made and the specifics of programming languages are taken into account.
Almost all programming languages of today have the XCA (conteXt-code-arguments) word order. A few languages have the CXA (code-conteXt-arguments) word order. And perhaps fewer still have something even more exotic.
The prevalence of XCA languages can be explained in various ways. My own theory is that the smoothness of XCA languages and better scoping and importing mechanisms are the main reasons here. I do wonder, however, whether XCA languages are “natural” for people who speak VSO or VOS languages.
Finally, I like LISPs for various reasons, but I have to admit that I do not like CXA/CA word orders. Perhaps one day someone would create a LISP dialect with the XCA word order and import semantics similar to that of object-oriented languages. It would be interesting to have a look at that.