To compress with tar, you just need to add a compression flag like -z for gzip. A classic example is tar -czvf archive.tar.gz /path/to/directory, which first bundles all the files into a single archive and then compresses it. This two-step process of bundling and compressing is what makes tar so powerful in any Linux or macOS environment.
Automate your infrastructure tasks and slash cloud costs with Server Scheduler. Explore our point-and-click automation tools and see how easy it is to manage your AWS resources.
Stop paying for idle resources. Server Scheduler automatically turns off your non-production servers when you're not using them.
Let’s be honest, the tar command feels ancient—a relic from the days of physical tape backups. Its name is literally short for Tape ARchive, a nod to its origins back in 1979. But in the world of cloud computing and DevOps, knowing how to compress with tar isn't just a retro skill; it's a secret weapon for slashing your cloud bill. This isn't just about zipping files. It’s about making smart choices that directly cut your storage and data transfer costs on platforms like AWS and GCP.

The link between this old-school command and your modern cloud budget is surprisingly direct. Every gigabyte you store or transfer costs money, plain and simple. When you efficiently bundle application logs, package build artifacts for a CI/CD pipeline, or create database backups with the right compression, you're generating real savings. Automate these tasks, as you can learn in our guide to Python automation scripts, and the cumulative effect on your FinOps goals can be massive.
Callout: The core principle is simple: smaller files cost less to store and move. Choosing the right
tarcompression method is a key lever in controlling those costs.
The tar command itself doesn't actually compress anything; it just bundles files together. It relies on external utilities that you can call with a simple flag. Understanding the trade-offs here is the key to using it effectively. Gzip is incredibly fast, which makes it perfect for on-the-fly tasks. On the other hand, bzip2 and xz offer much better compression ratios, but they'll chew up more CPU cycles to get there. This choice becomes critical for long-term storage, where every single byte counts. The table below offers a quick comparison of the most common compression methods used with tar.
| Method | Flag | Speed | Compression Ratio | Best For |
|---|---|---|---|---|
| Gzip | -z |
Fastest | Good | CI/CD pipelines, frequent log rotation |
| Bzip2 | -j |
Slower | Better | Less frequent backups, archival tasks |
| XZ | -J |
Slowest | Best | Long-term cold storage (e.g., S3 Glacier) |
By picking the right tool for the job, you turn a simple command into a powerful instrument for cloud cost management. It just goes to show that even decades-old tools have a vital place in a modern DevOps toolkit.
Forget the generic foo and bar examples you see everywhere. Let's talk about real-world use cases where you need to compress with tar. You might need to archive a month’s worth of application logs from /var/log/my-app before they get rotated, or maybe you're packaging a new build artifact for your CI/CD pipeline. This is where tar becomes an indispensable tool in your command-line arsenal. The basic structure for creating a compressed archive always involves a few key flags: -c to create a new archive, -v for verbose output (so you can see what’s happening), and -f to specify the output file name. When you combine these with a compression flag, you have a complete command ready to go.

The compression flag you pick isn't just a technical detail—it's a strategic choice that directly impacts the final file size and how long the process takes. Your three main options are gzip (-z), which creates a .tar.gz file and is a workhorse for speed; bzip2 (-j), which creates a .tar.bz2 file with a better compression ratio; and xz (-J), for .tar.xz files offering the best possible compression at the cost of speed. By 1993, GNU tar had integrated gzip compression with the -z flag, which was a game-changer. A 1GB directory of text-based logs could suddenly shrink down to just 200MB—an 80% reduction in a single pass. You can explore more on how tar compression works on Linux to get a deeper appreciation for it.
Let's put these flags into action. Suppose you need to archive that my-app log directory we mentioned earlier. Here’s how you’d tackle it with each compression type:
# Using gzip for a fast, balanced archive
tar -czvf app-logs.tar.gz /var/log/my-app/
# Using bzip2 for better compression
tar -cjvf app-logs.tar.bz2 /var/log/my-app/
# Using xz for maximum compression (long-term storage)
tar -cJvf app-logs.tar.xz /var/log/my-app/
A Personal Tip: I always include the verbose (
-v) flag when creating archives. When you're bundling a directory with thousands of files, seeing the filenames scroll by gives you immediate feedback that the command is working and hasn't hung. It has saved me from hours of second-guessing more times than I can count.
Mastering these commands is a solid first step. When you're ready to build more powerful automation around these concepts, check out our comprehensive Bash scripting cheat sheet.
Making a .tar.gz file is easy. The real work begins when you actually need something out of it. A backup you can't restore is just a waste of disk space. Knowing how to manage your archives—listing contents without making a mess, pulling out a single file, or verifying the whole thing is intact—is a fundamental DevOps skill. It’s the difference between a quick fix and a stressful data recovery scramble.
Before you ever unpack an archive, you should always check what's inside. Blindly extracting a multi-gigabyte backup into your current directory can overwrite files and create a huge mess. Instead, use the -t (or --list) flag. It shows you the contents without extracting a single file. For example, tar -tzf app-logs-backup.tar.gz will list the contents of a gzipped archive. When you're ready to get your files back, the -x (or --extract) flag is what you need. tar -xzf app-logs-backup.tar.gz will unpack everything right where you are.
However, you can be much more surgical. Just tell tar the exact path to the file you want, like tar -xzf big-backup.tar.gz path/to/critical.conf. To avoid clutter, I always recommend directing the output to a specific folder using the -C (or --directory) flag. For example: tar -xzf app-logs-backup.tar.gz -C /tmp/restore/. This level of precision turns a frantic recovery operation into a controlled, predictable task. That same control is vital when you're managing files on remote machines. If you're doing that often, you might find our guide on how to efficiently download files over SSH useful.
Ready to level up your tar game? Once you're comfortable creating and extracting archives, the real power comes from weaving tar into your automated workflows. This is where you graduate from one-off commands to building efficient scripts for backups, deployments, and data migrations in the cloud. One of the most powerful tar tricks is streaming an archive directly over SSH. This lets you move entire directories between servers without creating a temporary file on disk first. The magic happens by piping the output of one tar command directly into another one running on the remote machine. For example: tar -czf - /path/to/source | ssh user@remote_server 'tar -xzf - -C /path/to/destination'. This technique is invaluable for deploying code or syncing environments.
Another pro-level technique involves using the --exclude flag to create clean, lean archives. Including noisy node_modules directories or sensitive .env files just bloats your backups and can create security risks. A command like tar -czvf project.tar.gz --exclude='node_modules' --exclude='.env' /path/to/project/ ensures you only archive what's necessary. Finally, for cloud storage services like Amazon S3, it's often better to split large archives into smaller chunks. You can combine tar with the split command: tar -czf - /var/logs | split --bytes=200M - "log-backup.tar.gz.". This creates a series of 200MB files that are easier to upload and manage. This pattern is common in professional environments, a tactic detailed in guides on how tar compression is used in professional Linux environments. These smaller files are way easier to upload, download, and manage, and this technique is frequently seen in robust, scheduled automation scripts like those discussed in our guide on setting up a crontab to run every 15 minutes.
Picking the right compression when you compress with tar isn't just about flags—it's a strategic choice that impacts performance, workflows, and your cloud bill. The decision always comes down to a classic trade-off: do you need it done fast, or do you need it packed small? In many DevOps scenarios, like packaging build artifacts in a CI/CD pipeline, speed is the only thing that matters. This is where gzip (-z) shines. It’s lightning-fast and sips CPU, making it the perfect tool for high-frequency, automated tasks.

On the other side is long-term archival. When shipping terabytes of data to cold storage like Amazon S3 Glacier, your only goal is to shrink the data as much as possible to slash storage costs. This is the home turf for xz (-J) and bzip2 (-j). These methods are CPU-intensive but provide significantly smaller files. Your choice is all about context: use tar.gz for speed in your hot paths (like deployments) and tar.xz for cost savings in your cold paths (like compliance archives). Choosing the right compression method is a key part of your overall cloud cost optimization strategies.
| Method | Compression Ratio | Compression Speed | Memory Usage | Best For |
|---|---|---|---|---|
| gzip | Good | Fastest | Low | CI/CD artifacts, frequent log rotations |
| bzip2 | Better | Slow | High | Weekly backups, application archives |
| xz | Best | Slowest | Very High | Long-term cold storage, compliance data |
Talk to any DevOps engineer, and they'll have a tar horror story. Mastering this tool isn't just about memorizing flags; it's about sidestepping the common traps that turn a routine backup into a full-blown catastrophe. One of the first mistakes everyone makes is using absolute paths when creating an archive, like tar -czf backup.tar.gz /home/user/data. When you try to extract that archive, tar will try to write everything back to the exact same path, potentially overwriting critical files.
The only sane way to create archives is with relative paths. The trick is to cd into the parent directory first, then create the archive using the folder name. For example: cd /home/user/ and then tar -czf backup.tar.gz data. Now, when you extract backup.tar.gz, it simply creates a data directory right where you are. It's predictable and safe. Beyond pathing, a few flags are non-negotiable for creating reliable archives. The most overlooked is -p (--preserve-permissions), which is crucial for system backups as it keeps file permissions and ownership intact. Forgetting this can lead to a "successful" restore that leaves your app totally broken, which is why understanding file permissions in Linux is so important. Building a solid backup strategy is fundamental in any environment, such as when you learn how to backup your WordPress site.
Even after you’ve mastered the tar basics, a few tricky questions always seem to pop up. For instance, many people wonder if they can add files to an already compressed archive. You can append to a plain, uncompressed .tar archive with the -r flag, but it's not possible to append directly to a compressed archive like .tar.gz. You must decompress it, add the new files, and re-compress it.
Another common question relates to file permissions. By default, tar tries to preserve permissions on extraction, but it's not a guarantee. To be 100% certain, always use the -p (or --preserve-permissions) flag. This is non-negotiable when archiving system files or application code. People also ask about the difference between tar and zip. The core difference is that tar bundles files first, then compresses the single bundle, which often leads to better compression ratios. Zip, in contrast, compresses each file individually before adding it to the archive.
Finally, the choice between faster or smaller compression depends entirely on your goal. For frequent, automated jobs like CI/CD artifacts, the speed of gzip (-z) is your best bet. For long-term archival where minimizing cloud storage costs is the priority, the higher compression ratios of xz (-J) or bzip2 (-j) are the clear winners.
Ready to stop manually managing your cloud resources? With Server Scheduler, you can automate start/stop times, resizes, and reboots for your AWS infrastructure with a simple, visual interface. Cut your cloud bill by up to 70% and free up your team for more important work. Explore Server Scheduler today!