Learn awk Command in Linux — Text Processing with Examples

Q: Can awk process multiple files?

Yes. awk processes each file in sequence. NR counts across all files; FNR resets to 1 for each file.

awk Command in Linux — Text Processing with Examples

DodaTech Updated Jun 20, 2026 7 min read

awk is a powerful text-processing language built into Linux. It reads input line by line, splits each line into fields, and lets you apply patterns, calculations, and formatting — all in one command.

What You’ll Learn

By the end of this tutorial, you’ll know how to print columns, set custom field separators, match patterns, use BEGIN/END blocks, work with built-in variables (NR, NF, FS), write conditionals, format output with printf, perform calculations, and use arrays.

Why awk Matters

awk is the go-to tool for column-based data extraction — parsing log files, processing CSV reports, analyzing system output, and transforming structured text. Durga Antivirus Pro uses awk to parse threat detection logs, and DodaZIP uses it to compute compression statistics from batch job outputs.

awk Learning Path

    flowchart LR
  A[sed Command] --> B[awk Command<br/>You are here]
  B --> C[xargs Command]
  C --> D[Shell Scripting]
  D --> E[System Administration]
  style B fill:#f90,color:#fff

Prerequisites: Familiarity with essential Linux commands. Understanding grep and sed helps.

Syntax Overview

awk 'pattern { action }' file...

awk runs the action for every line that matches the pattern. If no pattern is given, the action runs on every line.

Built-in Variable	Description
`$0`	Entire current line
`$1`, `$2`, …	Individual fields
`NR`	Current record (line) number
`NF`	Number of fields in current line
`FS`	Field separator (default: whitespace)
`OFS`	Output field separator
`RS`	Record separator (default: newline)
`ORS`	Output record separator

10 Practical Examples

Create a sample data file:

cat > employees.txt << 'EOF'
Alice Johnson 75000 IT
Bob Smith 62000 HR
Charlie Brown 88000 IT
Diana Wilson 54000 Marketing
Eve Davis 95000 IT
Frank Miller 71000 HR
Grace Lee 82000 Marketing
Henry Taylor 67000 IT
EOF

1. Print Columns

Print first and last name (columns 1 and 2):

awk '{print $1, $2}' employees.txt

Alice Johnson
Bob Smith
Charlie Brown
Diana Wilson
Eve Davis
Frank Miller
Grace Lee
Henry Taylor

2. Custom Field Separator

Process /etc/passwd with colon separator:

awk -F':' '{print $1, $6}' /etc/passwd | head -5

root /root
daemon /usr/sbin
bin /bin
sys /dev
sync /bin

3. Pattern Matching

Print IT department employees:

awk '$4 == "IT" {print $1, $2}' employees.txt

Alice Johnson
Charlie Brown
Eve Davis
Henry Taylor

Print employees earning more than 70,000:

awk '$3 > 70000 {print $1, $2, $3}' employees.txt

Alice Johnson 75000
Charlie Brown 88000
Eve Davis 95000
Frank Miller 71000
Grace Lee 82000
Henry Taylor 67000

4. BEGIN and END Blocks

Print a header before and summary after:

awk 'BEGIN {print "Name\t\tSalary\tDept"; print "----\t\t------\t----"} 
     {print $1, $2, "\t" $3, "\t" $4} 
     END {print "\nTotal employees: " NR}' employees.txt

Name		Salary	Dept
----		------	----
Alice Johnson 	75000 	IT
Bob Smith 	62000 	HR
Charlie Brown 	88000 	IT
Diana Wilson 	54000 	Marketing
Eve Davis 	95000 	IT
Frank Miller 	71000 	HR
Grace Lee 	82000 	Marketing
Henry Taylor 	67000 	IT

Total employees: 8

5. Built-in Variables NR and NF

Show line numbers and field counts:

awk '{print "Line " NR ": " NF " fields -> " $0}' employees.txt

Line 1: 4 fields -> Alice Johnson 75000 IT
Line 2: 4 fields -> Bob Smith 62000 HR
Line 3: 4 fields -> Charlie Brown 88000 IT
Line 4: 4 fields -> Diana Wilson 54000 Marketing
Line 5: 4 fields -> Eve Davis 95000 IT
Line 6: 4 fields -> Frank Miller 71000 HR
Line 7: 4 fields -> Grace Lee 82000 Marketing
Line 8: 4 fields -> Henry Taylor 67000 IT

6. Conditionals with if/else

Categorize salaries:

awk '{ if ($3 >= 80000) print $1, $2, "— High salary"; 
       else if ($3 >= 65000) print $1, $2, "— Medium salary"; 
       else print $1, $2, "— Standard salary" }' employees.txt

Alice Johnson — Medium salary
Bob Smith — Standard salary
Charlie Brown — High salary
Diana Wilson — Standard salary
Eve Davis — High salary
Frank Miller — Medium salary
Grace Lee — High salary
Henry Taylor — Medium salary

7. Formatting Output with printf

Right-aligned columns with fixed width:

awk '{printf "%-15s %-10s %7d  %s\n", $1, $2, $3, $4}' employees.txt

Alice          Johnson      75000  IT
Bob            Smith        62000  HR
Charlie        Brown        88000  IT
Diana          Wilson       54000  Marketing
Eve            Davis        95000  IT
Frank          Miller       71000  HR
Grace          Lee          82000  Marketing
Henry          Taylor       67000  IT

8. Calculations

Compute total and average salary:

awk '{total += $3} END {print "Total: $" total; print "Average: $" int(total/NR)}' employees.txt

Total: $611000
Average: $76375

9. Arrays

Count employees per department:

awk '{depts[$4]++} END {for (d in depts) print d ": " depts[d]}' employees.txt

IT: 4
HR: 2
Marketing: 2

10. Field Separator with Output

Change separator for output (CSV format):

awk 'BEGIN {FS=" "; OFS=","} {print $1, $2, $3, $4}' employees.txt

Alice,Johnson,75000,IT
Bob,Smith,62000,HR
Charlie,Brown,88000,IT
Diana,Wilson,54000,Marketing
Eve,Davis,95000,IT
Frank,Miller,71000,HR
Grace,Lee,82000,Marketing
Henry,Taylor,67000,IT

Common Use Cases

Parse Apache Access Logs

awk '{print $1, $9, $7}' /var/log/apache2/access.log | head -5

Sum File Sizes from `ls -l`

ls -l | awk '{total += $5} END {print "Total: " total " bytes"}'

Extract IPs from Auth Logs

awk '/Failed password/{print $(NF-3)}' /var/log/auth.log | sort | uniq -c | sort -rn

Print Lines Where Field N Matches Regex

awk '$4 ~ /^I/' employees.txt

Common Mistakes

1. Confusing `$` in awk vs Shell

Inside awk, $1 is the first field, NOT a shell variable. To use shell variables: awk -v var="$shell_var" '{print $1, var}'.

2. Forgetting That awk Indexes from 1

Field $0 is the whole line, $1 is the first field, $NF is the last field. There is no field zero for the first field.

3. Using `print` Without a Separator

print $1 $2 concatenates the values. print $1, $2 adds a space (the OFS). Use printf for precise control.

4. Not Quoting awk Programs

awk {print $1} will fail because the shell interprets the braces. Always use single quotes: awk '{print $1}'.

Practice Questions

1. How do you print the last field of every line?

awk '{print $NF}' file.txt

2. What does awk 'NR > 1 {print}' file.txt do?

It prints all lines except the first line (skips header row).

3. How do you count the number of lines containing “ERROR”?

awk '/ERROR/{count++} END {print count}' file.log

4. What’s the difference between print and printf in awk?

print automatically appends the output record separator (ORS, default newline) and separates fields with OFS. printf gives full control over formatting (width, alignment, decimal places).

5. Challenge: Write an awk one-liner that prints the top 3 highest-paid employees with their names and salaries.

awk '{print $3, $1, $2}' employees.txt | sort -rn | head -3 | awk '{print $2, $3, "— $" $1}'

Mini Project: Log Analysis Script

Create an awk script that analyzes an Apache access log:

#!/bin/bash
# analyze_log.awk — Run with: awk -f analyze_log.awk access.log

cat > analyze_log.awk << 'EOF'
BEGIN {
    print "=== Access Log Analysis ==="
    print ""
}

{
    total_requests++
    ip = $1
    status = $9
    page  = $7
    
    ips[ip]++
    statuses[status]++
    pages[page]++
    bytes += $10
}

END {
    print "Total requests: " total_requests
    print "Unique IPs: " length(ips)
    print "Total bytes transferred: " bytes
    print ""
    
    print "=== Top 10 IPs ==="
    for (ip in ips) {
        printf "%5d %s\n", ips[ip], ip | "sort -rn | head -10"
    }
    close("sort -rn | head -10")
    
    print ""
    print "=== Status Code Distribution ==="
    for (s in statuses) {
        printf "%5d %s\n", statuses[s], s | "sort -rn"
    }
    close("sort -rn")
}
EOF

Expected output (against an access log):

=== Access Log Analysis ===
Total requests: 15234
Unique IPs: 892
Total bytes transferred: 234567890

=== Top 10 IPs ===
 2347 192.168.1.100
 1892 10.0.0.55
 1456 203.0.113.42

=== Status Code Distribution ==~
 12034 200
  1890 304
   890 404

FAQ

What does awk stand for?

It stands for Aho, Weinberger, Kernighan — the three creators. It was written at Bell Labs in 1977.

Is awk a programming language?

Yes. awk has variables, arrays, loops, conditionals, functions, and I/O. You can write substantial programs entirely in awk.

What’s the difference between awk and sed?

sed is a stream editor focused on line-based substitutions and deletions. awk is a full data processing language with field awareness, arithmetic, and control flow. Use awk when you need to work with columns or compute values.

Can awk process multiple files?

Yes. awk processes each file in sequence. NR counts across all files; FNR resets to 1 for each file.

What’s Next

xargs Command — Build and Execute

sed Command — Stream Editor

grep Command — Pattern Search

Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.

Previous sed Command in Linux — Stream Editor with Practical Examples Next find Command in Linux — Search Files with 10 Examples

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse Linux Administration