Linux Diary: Text Processing

Shells scripting output is generally shared as reports which will be studied later to gather any details.
So, text processing is more important here.

So let's discuss few commands on that.

more: Sometimes we get a very large output on the screen for certain commands, which cannot be viewed completely on one screen. Use more command to view the output text one page at a time. Add "| more" after the command, as follows:
$ ll /dev | more

The | is called pipe,which is used to pass output of one command as input to another one.

For more, pressing the spacebar will move the output on the screen one page at a time, or pressing Enter will move the screen by one line at a time.

less: Instead of more, if you use less it will show a screen containing the full text all at once. We can move forward as well as backwards. This is a very useful text filtering tool.
The syntax of usage is as follows:
$ command | less
e.g. $ ll /proc | less

In addition to scrolling forward or backward, you can search for pattern using / for forward search and ? for backward search. You can use N for repeating the search in a forward or backward direction.

Let's create a text file from 1 to 100 printed on separate lines using.
seq 100 > numbers.txt

$ head // will display top 10 lines
$ head -3 numbers.txt // will show first 3 lines
$ head +5 numbers.txt // will show from line 5. Some shell may not work this command

The following example shows the usage of the tail command:
$ tail // will display last 10 lines
$ tail -5 numbers.txt // will show last 5 lines
$ tail +15 numbers.txt // will show from line 15 onwards. Some shell may not work

To print lines 61 to 65 from numbers.txt into file log.txt, type the following:
$ head -65 numbers.txt | tail -5 > log.txt

The diff command is used to find differences between two files.
diff file1 file2

One sample output of diff: 0a1 (line 1 is added to file 2)

The cut command is used to extract specified columns/characters of a text, which is given as follows:
• -c: Will specify the filtering of characters
• -d: Will specify the delimiter for fields
• -f: Will specify the field number

Similarly, paste.
we can paste two files horizontally, such as file_1, which will become the first column and file_2 will become the second column:
paste file_1 file_2

For the files with the common fields i.e., that are the same in both files. We can combine both files by following command:
join one.txt two.txt

This command removes duplicate adjacent lines from the file:
uniq file1

this command prints only duplicate lines.
uniq -d file2

The comm command shows the lines unique to file_1, file_2 along with the common lines in them.
comm –nocheck-order file_1 file_2

The tr command is a Linux utility for text processing such as translating, deleting,or squeezing repeated characters, which is shown as follows:
$ tr '[a-z]' '[A-Z]' < filename
This will translate the lower case characters to upper case:
$ tr '|' '~' < emp.lst
This will squeeze multiple spaces into a single space:
$ ls –l | tr -s " "
In this example, the -s option squeezes multiple contiguous occurrences of the character into a single char.
Additionally, the -d option can remove the characters.
Sort: It sorts the contents of a text file, line by line.
• -n: Will sort as per the numeric value
• -d: Will sort as per the dictionary meaning
• -r: Will sort in the reverse order
• -t: Option to specify delimiter for fields
• +num: Specifies sorting field numbers
• -knum: Specifies sorting filed numbers
• $ sort +4 sample.txt: This will sort according to the 4th field

Linux Diary

Major Pages

Thursday, August 23, 2018

Text Processing

No comments:

Post a Comment