Short Introduction
Naming conventions are ubiquitous. Usually, we tend to think about them in the context of programming or markup languages, but they are present in every language which has words made up of characters.
Do naming conventions matter, or they are nothing but a convention? Is there the true best naming convention or not? Is there at least a way to compare them on their merits, or is it a case that there are no merits to speak of?
Let’s try to clarify.
Naming Conventions in Natural Languages
Let’s begin with natural languages first.
Conveniently, this post is in a natural language called English, and you can already spot a few things. Firstly, almost all text is in lower-case characters. However, some words begin with a capital letter. The rules of modern English say that each sentence must begin with a capital letter. Also, names (including the names of languages) begin with a capital letter too. Secondly, words are separated by a whitespace. However, some combinations of words like “lower-case” use a different separator. I am sure professional linguists have a special explanation for this, but even for a layman, it is clear that using a dash here highlights the fact that these words are more closely related to each other than to other words in a sentence, acting as a single unit.
What would happen if we ignore these conventions? We would end up with something like this:
convenientlythispostisinanaturallanguagecalledenglishandyoucanalreadyspotafewthings firstlyalmostalltextisinlowercasecharactershoweversomewordsbeginwithacapitalletter
Is it possible to read it? Yes. Is it easy? No. The problem is that by dropping the standard conventions we also dropped an important part of information: visual cues, which help us to notice the structure of the text and interpret its meaning.
Now let’s consider some text in German:
Ich schlage ein buntes Kinderbuch auf.
Here, “Kinderbuch” is not a name. It is just a noun. Also, it means “children’s book” and is made of two words: “Kinder” and “Buch” joined together. As you can see, the rules are different to those in English, but the underlying mechanism is the same: conventions are there to act as visual cues and improve clarity.
Naming Conventions in Programming, Markup, and Query Languages
When the first non-natural languages were created, people had to decide on the naming conventions. It was necessary to name things and refer to them by names. However, it was not possible to re-use the naming conventions of natural languages. The problem was, that these new languages required more precision. That precision was achieved in various ways, but mostly through scoping and naming. Quite often, it is not possible to rely on short words like “account”, “velocity”, “currency”, or “address”. One has to be more specific to refer precisely to “nostro account”, “velocity of the aeroplane”, “quoting currency of the asset” or “shipping address”. An additional complication was that the syntax of these artificial languages often was fairly complex and feature-rich, which may get in the way of a naming convention for these references.
The result is that multiple naming conventions have been introduced over decades. The popular ones are:
- words_separated_by_underscores
- words-separated-by-dashes, also known as “kebab case”
- wordswithoutseparators
- camelCase
- UpperCamelCase
In most of these variants, there are sub-variants which mandate that everything has to be in lowercase or uppercase. With camelCase variants, it is often insisted that not only the case of the first letter of each word must follow certain rules, but also the case of all other letters is forced to be lowercase. In this case, for example, it is not allowed to write “balanceInUSD”, one has to write “balanceInUsd”, even though “USD” is an acronym and has to be in upper-case characters in pretty much any natural language.
Merits of Naming Conventions
While a preference for a specific naming convention is often seen as little more than that, I do believe it is possible to talk about objectively good/bad features of naming conventions.
Here is my list of what I think makes sense: 1. The naming convention must be compatible with the rest of the language. 2. It must not require that the case of characters is changed from what is the norm for the word. 3. The resulting identifiers should, ideally, be shorter rather than longer. 4. The identifiers should be easy to type on a normal keyboard.
Compatibility with the Language
This requirement simply must be met, and it means that identifiers or their parts will not be erroneously mistaken for something else.
A kebab notation is a good example. It is used in LISPs and you may have code like this:
(+ x +static-offset+)
Here, we add up “x” and “+static-offset+”. Is it ambiguous? No, it is not. This is because “+”, “x” and “+static-offset+” are separated by white spaces (which is required) and the structure of AST is trivially understood as there is no such thing as operator precedence.
As a side note, take a closer look at “+static-offset+”: the convention says that a dash is used to separate parts of the identifier and identifiers with pluses on the sides refer to constants.
Can we use the kebab notation in languages like C?
return current-balance+deposited-cash;
No, I am afraid not. Not without demanding that additional whitespaces would be used on top of that:
return current-balance + deposited-cash;
Not Changing Case Unnecessarily
This one is controversial, it seems. Quite a few people believe that the case of words which constitute an identifier has to be defined solely by a specific simple rule and not by the nature of the word. I think there is a value in preserving the natural case for things like acronyms. So I am going to show you some examples, and it will be up to you to decide what is right.
Notation Family | Enforced Lowercase | Enforced Uppercase | Natural |
---|---|---|---|
Underscores | pid_filter asset_usd_price |
PID_FILTER ASSET_USD_PRICE |
PID_filter asset_USD_price |
Kebab Case | pid-filter asset-usd-price |
PID-FILTER ASSET-USD-PRICE |
PID-filter asset-USD-price |
CamelCase | pidFilter assetUsdPrice |
PidFilter AssetUsdPrice |
PIDFilter assetUSDPrice |
Shorter is Better Longer
This one is simple: horizontal space is limited and using unnecessarily_complicated_and_long_identifiers should be avoided.
Therefore, “no spaces” notations win this one, and it may be the reason why some people prefer CamelCase.
However, it is worth noting that notation itself cannot save the day if the language is flawed and perhaps the
culture is weird too. And so, you end up with gems like RequestMappingInfoHandlerMethodMappingNamingStrategy
.
Not only they are long, but their specificity does not add to precision: it makes it harder to understand what
that thing actually is.
Ease of Typing
For better or worse, there are standards on how computer keyboards are supposed to work and there is no chance it will change. As a result, some naming conventions are easier to type than others.
It is easiest to type when therearenogapsbetweenwords
, but the result is unreadable.
It is easy to type where separators are dashes, and most of the characters are in lowercase. This is because typing a dash is a matter of hitting a single key rather than holding “SHIFT” and hitting a key.
Where underscores_are_used or identifiers are in CamelCase, one has to hold “SHIFT” pressed pretty often. That is a bit harder than everything before.
Finally, screaming notations where identifiers are meant to be in uppercase may be the worst. It is “maybe”, because theoretically, the relevant programming environment may automatically change the case of characters to uppercase. If they do not, then either one has to enable “Caps Lock” or hold “SHIFT” pretty much non-stop, which is no joy.
Conclusion
Nobody can or should change established standards. Java will remain CamelCased while LISPs will stay kebab-cased. The point I was trying to make is that notations are not exactly equal and some are better than others.
My personal favourite by far is the kebab notation. It is pleasant to type, it looks good, and it is not explicitly restrictive regarding the case of characters, which allows me to type acronyms the way they should be typed. Sadly, it is incompatible with most of the languages.
The next best notation for me is the one where words are separated by underscores. It is harder to type, as one needs to hold SHIFT down when typing underscores. However, the resulting identifiers are still readable and the core notation is not intrusive.
Various variants of CamelCase are worse: while gaining in being slightly shorter (a small benefit), they lose in readability and ease of typing. Sadly, they are quite popular. I cannot fathom why.
Have a good day, and may you never get RSI from TooMuchTyping.