iTranslated by AI
Why I Love the AWK Programming Language
I love the AWK programming language, so I'm going to talk about what I like about it. This article is mainly for those who don't know AWK or know it but haven't used it, but I also hope that those who have used it can look on from a distance and see, "So that's how some people feel about using it." Also, in the latter half, I'll introduce a recently published book on AWK, which might be helpful as well.
What is AWK?
AWK is a programming language born at AT&T Bell Laboratories in 1977. AT&T Bell Labs is where UNIX was born. The authors are all legends in this field with deep ties to UNIX. For example, Professor Kernighan is famous as the author of many books, including The C Programming Language. It feels like it might be a powerful language.
AWK is a general-purpose programming language, but it is exceptionally good at "writing one-liners that process text files where one line is one record with minimal effort." Since it was born a long time ago, and because there are many languages today like Python and Ruby that excel at such tasks, there may be people who have never used it or don't even know it exists. Since there haven't been dramatic changes to the language specification since its birth, the look of the source code is a bit old-fashioned. However, even for someone like me who can use modern languages to a certain extent, there are still many situations where I intentionally choose AWK. I will introduce why that is the case and what I LOVE about it.
How to Use AWK
My apologies for the long introduction, but from here I will introduce how to write AWK. The input is a text file, input.txt, which consists of multiple records, each being a single line with two fields, "CPU vendor name" and "processor name", separated by a tab.
amd ryzen
intel pentium
amd athlon
transmeta crusoe
intel core
amd opteron
amd epyc
To display the first field of each line, write it as follows. $1 represents the first field.
$ awk '{print $1}' input.txt
amd
intel
amd
transmeta
intel
amd
amd
To display the second field, change $1 to $2.
$ awk '{print $2}' input.txt
ryzen
pentium
athlon
crusoe
core
opteron
epyc
This was achieved with a very small number of keystrokes. This is a big advantage when writing one-liners. Actually, for something this simple, you can use the cut command to output the first field like this:
$ cut -f 1 input.txt
amd
intel
amd
transmeta
intel
amd
amd
However, if it becomes a bit more complex, there are things cut cannot do. Below is code that counts the number of processor names where the CPU vendor is "amd". In AWK, you can write the condition for executing the code inside {} on the left side of the {}.
$ awk '$1=="amd"{n+=1}END{print n}' input.txt
4
In this way, AWK is suitable for "not just extracting specific fields, but doing something a little more complex."
Why Bother Using AWK?
Now, I will present my view on the question that will naturally arise: "You can do that with other scripting languages too." This part is largely a matter of personal preference, so I have no intention of forcing my thoughts on others.
As an example, let's say we write a one-liner in Ruby to display the second field of input.txt. I chose Ruby as a comparison here, but any language that is considered "good at quickly processing text" would work. Returning to the topic, a one-liner that does the above would look like this:
$ ruby -e 'ARGF.each do |l| puts l.chomp().split("\t")[1] end' input.txt
ryzen
pentium
athlon
crusoe
core
opteron
epyc
Yes. Of course, you can write it. However, in my sense, while it's fine for something written in source code, for a one-liner, this is a bit too many keystrokes and quite a chore. What was 10 characters in AWK becomes 50 characters in Ruby.
Right after saying that, Ruby can actually be written like AWK if you use the following command-line options:
- -n: Evaluates the expression after
-efor each line. - -a: Automatically splits the content of each line and saves it in
$F.
Using these options, it can be written like this:
$ ruby -nae 'puts $F[1]' input.txt
ryzen
pentium
athlon
crusoe
core
opteron
epyc
Ruby is amazing. But typing -nae is a bit of a hassle, so I don't use Ruby here. Well, there isn't much difference, so it's mostly just a matter of preference. However, when it comes to "code that counts the number of processor names where the CPU vendor is amd," it doesn't work out as well, and I wouldn't even think about writing it as a one-liner.
Situations Where I Don't Use AWK
As mentioned in the "What is AWK?" section, AWK is a general-purpose programming language. You can actually do quite a lot if you set your mind to it. However, when trying to do something as complex as the following, I feel it becomes easier to write code properly in Ruby, Python, or other languages:
- Not one record per line: You can handle this to some extent using the
getline()function, but there are limits. - Complex record structure: For example, it has no built-in features for parsing JSON or YAML.
- Complex logic for processing records: Specifically, when it doesn't end in just two or three statements.
Just to be clear, AWK is not only for writing one-liners; you can also save the source code to a file and run it using the awk -f <source_file> format.
In summary, I use AWK in cases like these:
- I want to process files where one record corresponds to one line.
- Single-purpose commands like
cutare insufficient. - The processing applied to each line is simple enough to be written as a one-liner.
While the use cases are somewhat limited, such situations occur on a daily basis. Of course, since making this judgment can be a hassle, choosing a scripting language from the start is a perfectly valid option. It's just that I personally choose to use AWK in the scenarios mentioned above.
As an aside, there is a game called awk-raycaster, which is a simplified version of a 3D shooter like DOOM. I think this is a great example of how much can be achieved with AWK. However, if you seriously wanted to build something like that, it would be far easier to use a language other than AWK. I'm not sure what the author's motivation was, but I personally love this kind of thing.
Book Introduction
A book titled The AWK Programming Language, Second Edition, written by the authors of AWK themselves, was published in 2024 (the original version was published in 2023). This book is an excellent resource for learning AWK, so I'd like to introduce it.
The contents of the book include the following:
- Explanation of basic syntax
- Creating programs for simple tasks
- Creating more substantial programs
- Reference
The section on creating substantial programs was quite interesting, with a collection of rather hardcore content. It gave me a fresh perspective, making me think, "I never thought of doing something like this with AWK." However, while I enjoyed reading it, it also reaffirmed my belief that I probably won't be doing complex things with AWK in the future.
Personally, I found the physical reference section the most valuable. Since the language specification is small, it's nice that it's only a few dozen pages long. It also introduces support for Unicode and CSV, which were added since the first edition was released a long time ago (the English version was published in 1988, and the Japanese version in 1989). I think both are excellent extensions given the nature of the AWK language.
The afterword describes the design intent of the language and how and why it has been extended. I found this afterword enjoyable to read and felt it holds historical value.
Conclusion
I've finished talking about my love for AWK. AWK is a very good language if you use it correctly. If you're interested, I think it's worth giving it a try. However, it's not something you absolutely can't live without, so if this article didn't resonate with you at all, it's not something you need to force yourself to learn. You can do as you please.
That's all.
Discussion