awk Command in Linux — Text Processing with Examples
awk is a powerful text-processing language built into Linux. It reads input line by line, splits each line into fields, and lets you apply patterns, calculations, and formatting — all in one command.
What You’ll Learn
By the end of this tutorial, you’ll know how to print columns, set custom field separators, match patterns, use BEGIN/END blocks, work with built-in variables (NR, NF, FS), write conditionals, format output with printf, perform calculations, and use arrays.
Why awk Matters
awk is the go-to tool for column-based data extraction — parsing log files, processing CSV reports, analyzing system output, and transforming structured text. Durga Antivirus Pro uses awk to parse threat detection logs, and DodaZIP uses it to compute compression statistics from batch job outputs.
awk Learning Path
flowchart LR
A[sed Command] --> B[awk Command<br/>You are here]
B --> C[xargs Command]
C --> D[Shell Scripting]
D --> E[System Administration]
style B fill:#f90,color:#fff
Syntax Overview
awk 'pattern { action }' file...awk runs the action for every line that matches the pattern. If no pattern is given, the action runs on every line.
| Built-in Variable | Description |
|---|---|
$0 | Entire current line |
$1, $2, … | Individual fields |
NR | Current record (line) number |
NF | Number of fields in current line |
FS | Field separator (default: whitespace) |
OFS | Output field separator |
RS | Record separator (default: newline) |
ORS | Output record separator |
10 Practical Examples
Create a sample data file:
cat > employees.txt << 'EOF'
Alice Johnson 75000 IT
Bob Smith 62000 HR
Charlie Brown 88000 IT
Diana Wilson 54000 Marketing
Eve Davis 95000 IT
Frank Miller 71000 HR
Grace Lee 82000 Marketing
Henry Taylor 67000 IT
EOF1. Print Columns
Print first and last name (columns 1 and 2):
awk '{print $1, $2}' employees.txtAlice Johnson
Bob Smith
Charlie Brown
Diana Wilson
Eve Davis
Frank Miller
Grace Lee
Henry Taylor2. Custom Field Separator
Process /etc/passwd with colon separator:
awk -F':' '{print $1, $6}' /etc/passwd | head -5root /root
daemon /usr/sbin
bin /bin
sys /dev
sync /bin3. Pattern Matching
Print IT department employees:
awk '$4 == "IT" {print $1, $2}' employees.txtAlice Johnson
Charlie Brown
Eve Davis
Henry TaylorPrint employees earning more than 70,000:
awk '$3 > 70000 {print $1, $2, $3}' employees.txtAlice Johnson 75000
Charlie Brown 88000
Eve Davis 95000
Frank Miller 71000
Grace Lee 82000
Henry Taylor 670004. BEGIN and END Blocks
Print a header before and summary after:
awk 'BEGIN {print "Name\t\tSalary\tDept"; print "----\t\t------\t----"}
{print $1, $2, "\t" $3, "\t" $4}
END {print "\nTotal employees: " NR}' employees.txtName Salary Dept
---- ------ ----
Alice Johnson 75000 IT
Bob Smith 62000 HR
Charlie Brown 88000 IT
Diana Wilson 54000 Marketing
Eve Davis 95000 IT
Frank Miller 71000 HR
Grace Lee 82000 Marketing
Henry Taylor 67000 IT
Total employees: 85. Built-in Variables NR and NF
Show line numbers and field counts:
awk '{print "Line " NR ": " NF " fields -> " $0}' employees.txtLine 1: 4 fields -> Alice Johnson 75000 IT
Line 2: 4 fields -> Bob Smith 62000 HR
Line 3: 4 fields -> Charlie Brown 88000 IT
Line 4: 4 fields -> Diana Wilson 54000 Marketing
Line 5: 4 fields -> Eve Davis 95000 IT
Line 6: 4 fields -> Frank Miller 71000 HR
Line 7: 4 fields -> Grace Lee 82000 Marketing
Line 8: 4 fields -> Henry Taylor 67000 IT6. Conditionals with if/else
Categorize salaries:
awk '{ if ($3 >= 80000) print $1, $2, "— High salary";
else if ($3 >= 65000) print $1, $2, "— Medium salary";
else print $1, $2, "— Standard salary" }' employees.txtAlice Johnson — Medium salary
Bob Smith — Standard salary
Charlie Brown — High salary
Diana Wilson — Standard salary
Eve Davis — High salary
Frank Miller — Medium salary
Grace Lee — High salary
Henry Taylor — Medium salary7. Formatting Output with printf
Right-aligned columns with fixed width:
awk '{printf "%-15s %-10s %7d %s\n", $1, $2, $3, $4}' employees.txtAlice Johnson 75000 IT
Bob Smith 62000 HR
Charlie Brown 88000 IT
Diana Wilson 54000 Marketing
Eve Davis 95000 IT
Frank Miller 71000 HR
Grace Lee 82000 Marketing
Henry Taylor 67000 IT8. Calculations
Compute total and average salary:
awk '{total += $3} END {print "Total: $" total; print "Average: $" int(total/NR)}' employees.txtTotal: $611000
Average: $763759. Arrays
Count employees per department:
awk '{depts[$4]++} END {for (d in depts) print d ": " depts[d]}' employees.txtIT: 4
HR: 2
Marketing: 210. Field Separator with Output
Change separator for output (CSV format):
awk 'BEGIN {FS=" "; OFS=","} {print $1, $2, $3, $4}' employees.txtAlice,Johnson,75000,IT
Bob,Smith,62000,HR
Charlie,Brown,88000,IT
Diana,Wilson,54000,Marketing
Eve,Davis,95000,IT
Frank,Miller,71000,HR
Grace,Lee,82000,Marketing
Henry,Taylor,67000,ITCommon Use Cases
Parse Apache Access Logs
awk '{print $1, $9, $7}' /var/log/apache2/access.log | head -5Sum File Sizes from ls -l
ls -l | awk '{total += $5} END {print "Total: " total " bytes"}'Extract IPs from Auth Logs
awk '/Failed password/{print $(NF-3)}' /var/log/auth.log | sort | uniq -c | sort -rnPrint Lines Where Field N Matches Regex
awk '$4 ~ /^I/' employees.txtCommon Mistakes
1. Confusing $ in awk vs Shell
Inside awk, $1 is the first field, NOT a shell variable. To use shell variables: awk -v var="$shell_var" '{print $1, var}'.
2. Forgetting That awk Indexes from 1
Field $0 is the whole line, $1 is the first field, $NF is the last field. There is no field zero for the first field.
3. Using print Without a Separator
print $1 $2 concatenates the values. print $1, $2 adds a space (the OFS). Use printf for precise control.
4. Not Quoting awk Programs
awk {print $1} will fail because the shell interprets the braces. Always use single quotes: awk '{print $1}'.
Practice Questions
1. How do you print the last field of every line?
awk '{print $NF}' file.txt
2. What does awk 'NR > 1 {print}' file.txt do?
It prints all lines except the first line (skips header row).
3. How do you count the number of lines containing “ERROR”?
awk '/ERROR/{count++} END {print count}' file.log
4. What’s the difference between print and printf in awk?
print automatically appends the output record separator (ORS, default newline) and separates fields with OFS. printf gives full control over formatting (width, alignment, decimal places).
5. Challenge: Write an awk one-liner that prints the top 3 highest-paid employees with their names and salaries.
awk '{print $3, $1, $2}' employees.txt | sort -rn | head -3 | awk '{print $2, $3, "— $" $1}'
Mini Project: Log Analysis Script
Create an awk script that analyzes an Apache access log:
#!/bin/bash
# analyze_log.awk — Run with: awk -f analyze_log.awk access.log
cat > analyze_log.awk << 'EOF'
BEGIN {
print "=== Access Log Analysis ==="
print ""
}
{
total_requests++
ip = $1
status = $9
page = $7
ips[ip]++
statuses[status]++
pages[page]++
bytes += $10
}
END {
print "Total requests: " total_requests
print "Unique IPs: " length(ips)
print "Total bytes transferred: " bytes
print ""
print "=== Top 10 IPs ==="
for (ip in ips) {
printf "%5d %s\n", ips[ip], ip | "sort -rn | head -10"
}
close("sort -rn | head -10")
print ""
print "=== Status Code Distribution ==="
for (s in statuses) {
printf "%5d %s\n", statuses[s], s | "sort -rn"
}
close("sort -rn")
}
EOFExpected output (against an access log):
=== Access Log Analysis ===
Total requests: 15234
Unique IPs: 892
Total bytes transferred: 234567890
=== Top 10 IPs ===
2347 192.168.1.100
1892 10.0.0.55
1456 203.0.113.42
=== Status Code Distribution ==~
12034 200
1890 304
890 404FAQ
What’s Next
Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro