Intro to Linux and the Bash command line, pt III

Published 2019-01-11 on Yaroslav's weblog

This text is also available in other languages: Русский

New year, new post. In this third, and most probably final part of these tutorial/guide series I will be mentioning some useful commands and programs usually present in most standard linux installations. I will be talking especially about programs/commands to manipulate text output from programs and files. I will also talk a little bit about regular expressions, a powerful tool to perform searches inside text strings.

Filters

These programs that perform operations on input text and then write them to standard output are commonly known as filters. You may already be familiar with one of these commands, which the first one that I'm going to talk about.

cat

This command allows you the see the content of a text file (or files). It stands for concatenate, and not for the house pet. The most basic use of this command is to view the contents of a text file, just by typing cat followed by the path of the file we wish to see. However, as it name implies, it also has the ability to concatenate the contents of multiple text files, for example text file 'sample.txt'

user@host:~/Documents/notes$ cat sample.txt
Pepe    cool
Tide Pods   lame
Uganda Knuckles cool
Thanos  cool
JPEG    ok
Despacito   lame
Bowsette    cool
Harold  cool
Sans    coolest
Minions lamest
NPC cool

And we want to concatenate its content with file 'sample2.txt' to standard output

user@host:~/Documents/notes$ cat sample.txt sample2.txt
Pepe    cool
Tide Pods   lame
Uganda Knuckles cool
Thanos  cool
JPEG    ok
Despacito   lame
Bowsette    cool
Harold  cool
Sans    coolest
Minions lamest
NPC cool
Troll Face  old
Can haz chezburger  really old
ROFLcopter  super old
Dancing baby    ancient

As usual, this command accepts different options, like for example the -n option to display line

user@host:~/Documents/notes$ cat -n sample.txt
     1  Pepe    cool
     2  Tide Pods   lame
     3  Uganda Knuckles cool
     4  Thanos  cool
     5  JPEG    ok
     6  Despacito   lame
     7  Bowsette    cool
     8  Harold  cool
     9  Sans    coolest
    10  Minions lamest
    11  NPC cool

As always, you can check other options by looking up cat and any other command with man, as explained in the previous part.

This is a really simple command, it shows the first n lines of a text file/output. To use it, type head, followed by -n, then the number of lines to show, and then the file. For example, let's say we want to see the first 5 lines of sample.txt

user@host:~/Documents/notes$ head -n 5 sample.txt
Pepe    cool
Tide Pods   lame
Uganda Knuckles cool
Thanos  cool
JPEG    ok

If we use the command without passing the number of lines we wish to see, it outputs the first 10 lines by default. For this just type head followed by the path of the file.

tail

Basically the same as head, except it shows the last lines. Let's say we want to see the last three lines

user@host:~/Documents/notes$ tail -n 3 sample.txt
Sans    coolest
Minions lamest
NPC cool

As with head, the default is to output 10 lines.

sort

This command is as obvious as it seems. It sorts output. For example

user@host:~/Documents/notes$ sort sample.txt
Bowsette    cool
Despacito   lame
Harold  cool
JPEG    ok
Minions lamest
NPC cool
Pepe    cool
Sans    coolest
Thanos  cool
Tide Pods   lame
Uganda Knuckles cool

sed

This one is a really powerful utility to transform and manipulate text, however, to keep this tutorial short, I will only be showing a couple of the most used cases. sed stands for "stream editor".

The way to use sed, is to pass it a kind of script (a sed script) that tells it what to do with the text. The first and one of the most basic uses of sed, is to basically perform the same task as head, to get the first n number of lines. For example, let's say we want the first 7 lines of sample.txt

user@host:~/Documents/notes$ sed '7q' sample.txt
Pepe    cool
Tide Pods   lame
Uganda Knuckles cool
Thanos  cool
JPEG    ok
Despacito   lame
Bowsette    cool

Of course what I've just told it you it does is a simplification of what it really does. Most accurately, the command or script that we passed to sed tells it to output the first seven lines, and the q tells it to stop after that.

Another basic use of sed, and arguably the most common one, is to perform search and replace operations on text. The basic syntax for this operations is 's///' where is the term you want to search for, and is the term you wish to replace it with.

By default it will replace only the first occurrence in each line, however, we can specify which or how many occurrences we want to replace by adding a number and/or letter to the end. For example, if we add a two ('s///2') it will replace only second occurrences of each line.

But what if we want to replace each and every occurrence in all of the text? For that we would use the letter g at the end. Let's say for example, that we want to replace all occurrences of "cool" in our sample.txt file, for "dank". In this case we would type something like this

user@host:~/Documents/notes$ sed 's/cool/dank/g' sample.txt
Pepe    dank
Tide Pods   lame
Uganda Knuckles dank
Thanos  dank
JPEG    ok
Despacito   lame
Bowsette    dank
Harold  dank
Sans    dankest
Minions lamest
NPC dank

A thing to keep in mind, is that you should be enclosing the sed script in single quotes. Of course these are only some of the most basic uses of this command.

grep

This is the last program to manipulate text output that I want to mention. I will demonstrate its basic use in this section, but I will show you a little bit more about it in the next section when I will be writing about regular expressions.

Back to grep, it is a program that basically searches a pattern that you give it, and it will print to you the lines that contain that pattern. For example, let's say that we want to see only the cool (or dank) memes in our file to be displayed

user@host:~/Documents/notes$ grep 'cool' sample.txt
Pepe    cool
Uganda Knuckles cool
Thanos  coolcharacter
Bowsette    cool
Harold  cool
Sans    coolest
NPC cool

This line of text that we passed it, is actually the most basic form of regular expression, of which we will be looking into detail next.

Regular expressions

A regular expression, or regex for short, is a string of text, that define a search pattern for a larger set of text. Regexes are used in many programs, such as in text editors, and search engines, and can be also of great use in the terminal

An intermission

Before going into actual regular expressions in grep, I want to mention a couple of characters that can make your life easier when dealing with files in the terminal. They are called wildcards, and they are the asterisk (*) and the question mark (?). If you've ever wondered why you can't use those characters in any of your files' names, that's why.

I'll start by explaining the asterisk. When you use the asterisk, you are asking to look at or take all files that contain the any number of any combination of symbols in the place where you put it. For example, we could be looking at files that start with sa

user@host:~/Documents/notes$ ls sa*
saturday.txt sample.txt sample2.txt sample.png

Or another example, we could be looking for files that just contain sa in their name

user@host:~/Documents/notes$ ls *sa*
asado.png saturday.txt sample.txt sample2.txt sample.png

Now the question mark. The question mark indicates that there should be a character in its place, just any character. Let's say that we want to see all files with name "sample" that have a three character extension

user@host:~/Documents/notes$ ls sample.???
sample.txt sample.png

Wildcards come really handy when you need to manipulate multiple files with similar names. If the files that you wish to manipulate don't really have similar names, you might want to use curly braces to indicate a list of files to manipulate, separated by commas. For example

user@host:~/Documents/notes$ rm {monday.txt,december1999.txt,saturday.txt}

Back to regex

Now I'll explain some things about regular expressions, and I'll demonstrate some basic uses with grep. Here are some basic concepts

  • . - The dot means a single character (any character). e.g. 'be.r' would match bear, beer, befr, etc.
  • * - The preceding element matches 0 or more times. e.g. 'an*t' would match at, ant, annt, annnt, etc.
  • + - The preceding element matches one or more times. e.g. 'an+t' would match ant, annt, annnt, etc.
  • ? - The preceding element matches 0 or one time. e.g. 'an?t' would match at, and ant.
  • {n} - The preceding element matches exactly n times.
  • {min, } - The preceding element matches at least min times.
  • {min, max} - The preceding element matches at least min times, and no more than max times.
  • | - The pipe, logical OR operator. e.g. 'gray|grey' would match gray and grey
  • () - The parenthesis group multiple characters as one element. e.g. 'gr(a|e)y' would match gray and grey.
  • [abc] - It matches if a character is one of those inside the brackets.
  • [^abc] - It matches if none of the characters is one of those inside the brackets.
  • [a-d] - A range of characters. i.e. a, b, c, or d.
  • ^ - Matches the beginning of the line.
  • $ - Matches the end of the line.

So now let's suppose for a practical example with grep, that we want to find all lines that have "cool" or "ok" in them. In this case we would use the "|" pipe symbol. However, if we use normal grep, we would have to escape the pipe symbol like this "|". That's why it is better that we use "grep -E" to enable extended regex, or its shorter alias "egrep". It would look something like this

user@host:~/Documents/notes$ egrep 'cool|ok' sample.txt
Pepe    cool
Uganda Knuckles cool
Thanos  cool
JPEG    ok
Bowsette    cool
Harold  cool
Sans    coolest
NPC cool

Let's suppose, for another example, that we want to match those lines with a 't' as the last character

user@host:~/Documents/notes$ egrep 't$' sample.txt
Sans    coolest
Minions lamest

I have already mentioned and shown you the use of regexes with grep (and/or egrep). Now I would like to show a more practical example with sed. Yes, sed uses its own script language to alter text input, however, it also makes use of regular expressions.

Let's suppose that we have a file that looks like this

user@host:~/Documents/notes$ cat shortcuts
# Some shortcuts

d       ~/Documents
D       ~/Downloads
m       ~/Music
pp      ~/Pictures
vv      ~/Videos


s       ~/.scripts # My scripts
cf      ~/.config # My configs

As we can see there is a lot of whitespace, and although comments might be of help to humans, they are of no use to machine. Let's begin by getting rid of the comments, for that first need to remember the search and replace command of sed, 's//replace/g', since we basically want to get rid of any comment-looking string and replace it with, well, nothing. Now we have to think of a regex that will match comments, for that '#.*' will do. What regex means is, match '#' and everything after it. Now let's put it together, and

user@host:~/Documents/notes$ sed 's/#.*//g' shortcuts


d       ~/Documents
D       ~/Downloads
m       ~/Music
pp      ~/Pictures
vv      ~/Videos


s       ~/.scripts
cf      ~/.config

Bam, there it is. However, we still have the blank lines left, and, if you pay close attention, the comments have been deleted, but, the spaces that used to be before some of the comments are still there.

So first, let's improve our current sed command, if we want to match 0 or more spaces (zero because not every comment has a space before it) we would use the * symbol, but what symbol would we use for spaces? Well, that's an easy one, in sed we escape spaces like this '\s', so now our sed command looks like this 's/\s*#.*//g'.

Let's take care of the last part, getting rid of blank lines. For this we would need to issue a separate command, but fortunately we can stack commands in one line with a semicolon (;). Now that we know that we need a way to match empty lines with a regex, that's very easy - '^$' just match the beginning and the end of line together, after that, we add a sed command for deleting lines which I haven't mentioned (d), and our one liner is ready...

user@host:~/Documents/notes$ sed 's/\s*#.*//g; /^$/d' shortcuts
d       ~/Documents
D       ~/Downloads
m       ~/Music
pp      ~/Pictures
vv      ~/Videos
s       ~/.scripts
cf      ~/.config

Of course, issuing this command will not replace the original file, it will simply output the result to the terminal screen. If you want to overwrite the original file with the result of the sed command, you can pass sed the '-i' option.

Piping and redirecting output

This post is already getting too long, however there's one more useful thing about *nix systems that I'd like to mention - the pipeline. The pipeline in Unix and Unix-like OSs is a chain of redirected output to the input of another program. Along with that, there are operators to redirect standard output to files (and viceversa).

Redirecting to and from files

Let's suppose that we want to repeat the last example, and want to clean the file of comments and blank lines. We already now how to overwrite that file, however, what if we want to save it to another file using common Unix operators in bash. For that we can use the '>' and '>>' operators. For example, let's we want to save the result to a second file called "shortcuts_clean"

user@host:~/Documents/notes$ sed 's/\s*#.*//g; /^$/d' shortcuts > shortcuts_clean

Since there was no "shortcuts_clean" file, it has been created automatically. However, if the file had already existed, it would have overwritten it, unless we had used the '>>' operator, in that case, it would have appended the output to the already existent file.

Just as there's '>' to redirect TO files, there's also the '<' to redirect from files to a program's standard input. However, must of the times you would just pass the name/path of the file to the program as an argument.

Piping

Now that we know how to redirect from and to files, we can learn how to redirect from one program to another, with pipes. The pipe operator in *nix systems is the vertical bar symbol (|). Let's suppose that we want to see the first three files in our current directory, for that, we can pipe the output of ls into head, like this

user@host:~/Documents/notes$ ls | head -n 3
asado.png
monday.txt
sample.txt

Now let's get back to our sample.txt file. Let's imagine that we first want to sort our lines, and we want to preserve only those lines that contain "cool" or "lame". Then let's suppose we want to modify to contain legit terms, and not some antiquated boomer slang, so we want to replace cool with dank, and lame with normie. Finally we want that to be output to a file instead of the screen. Whew! Sounds like a lot of stuff to do, but it is quite simple, and it looks like this

user@host:~/Documents/notes$ egrep 'cool|lame' sample.txt | sort | sed 's/cool/dank/g;s/lame/normie/g' > memes.txt

So if we now take a look at the file...

user@host:~/Documents/notes$ cat memes.txt
Bowsette    dank
Despacito   normie
Harold  dank
Minions normiest
NPC dank
Pepe    dank
Sans    dankest
Thanos  dank
Tide Pods   normie
Uganda Knuckles dank

And that's basically it.

Post scriptum

Before ending it for good, I want to show some other programs that might be of use in the Bash command line.

less

This command might come in handy when there's another command that outputs a lot of text that overfills the terminal screen. You can pipe (as we have just learned) the output of that command to less, so that you can navigate with your arrow keys, or better yet with vim keys (hjkl). You can also search for terms by typing slash (/), just like with man.

tar

This program is used in Linux to create and extract archives with the .tar format, usually also compressing them using gunzip (.gz).

There usually two ways you will be using the program. One to extract files from a compressed archive

user@host:~/Documents/notes$ tar -xzvf oldnotes.tar.gz

And the other to archive and compress files

user@host:~/Documents/notes$ tar -czvf allnotes.tar.gz *

To learn more about the different options of this program, I recommend you check the man pages of tar ('man tar').

ssh and scp

You may have already heard about ssh, which stands for "secure shell", even if you are new to Linux or Unix/Unix-like systems. This program is used to connect to other computers over a network (or the internet for instance), especially to servers.

Let's suppose that you have a server with ip address 180.80.8.20 and your user is tux

user@host:~$ ssh tux@180.80.8.20

Of course, here we have assumed that the standard ssh port (22) is being used, otherwise you will have to specify it by passing -p followed by the port number.

Now let's talk about scp, which stands for "secure copy". This command uses the same protocol as ssh, and it's used to copy files from one computer to another over a network. Let's suppose that you want to copy a file from your current computer to the server we used in the previous example

user@host:~$ scp somefile tux@180.80.8.20:/home/tux/directory/

If we were trying to do it the other way around, that is, from the remote computer to your local computer, it would look like this

user@host:~$ scp tux@180.80.8.20:/home/tux/directory/somefile directory/

Just as with ssh, if you are not using standard port 22, you need to say to the program to which port you are trying to connect, except that in the case of scp, the flag is '-P' instead of '-p', and goes right after "scp".

Well, that's it for this tutorial/guide series, I really hope it was of use to you.

© 2018—2024 Yaroslav de la Peña Smirnov.