Linux – Using ‘split’ command

The split command in Unix and Linux is used to divide a large file into smaller files. It’s particularly useful when you need to handle a large file in chunks or split data for easier processing or transmission.

Here’s a detailed breakdown of the usage of the split command:

Basic Syntax:

Bash
split [OPTION]... [INPUT [PREFIX]]

  • INPUT: The name of the file you want to split.
  • PREFIX: The prefix for the names of the output files. By default, the output files will be named xaa, xab, xac, etc. If a prefix is provided, the output files will use that prefix (e.g., fileaa, fileab, etc.).

Commonly Used Options:

1. Split by Size:
To split a file based on size, use the following options:

      Bash
      split -b SIZE INPUT PREFIX

      • SIZE: Size of each chunk. You can specify the size in bytes (default), kilobytes (K), megabytes (M), or gigabytes (G).
        • Example: split -b 10M largefile part_
        • This splits largefile into chunks of 10 MB each with names like part_aa, part_ab, etc.

      2. Split by Number of Lines:
      To split a file based on the number of lines in each chunk:

        Bash
        split -l NUMBER INPUT PREFIX

        • NUMBER: The number of lines each output file should have.
          • Example: split -l 1000 data.txt output_
          • This splits data.txt into files of 1000 lines each with names like output_aa, output_ab, etc.

        3. Split by Number of Files:
        To split a file into a specific number of output files:

          Bash
          split -n NUMBER INPUT PREFIX

          • NUMBER: The number of chunks (files) to create.
            • Example: split -n 5 bigfile part_
            • This splits bigfile into 5 equally sized parts named part_aa, part_ab, etc.

          4. Split with Numeric Suffix:
          By default, split uses alphabetical suffixes (xaa, xab, etc.). To use numeric suffixes instead:

            Bash
            split --numeric-suffixes=1 INPUT PREFIX

            • Example: split --numeric-suffixes=1 largefile part_
            • This creates files with names like part_01, part_02, etc.

            5. Custom Suffix Length:
            To specify the length of the suffix (the default is 2 characters):

              Bash
              split -a LENGTH INPUT PREFIX

              • Example: split -a 3 largefile part_
              • This would create files with names like part_aaa, part_aab, etc., where the suffix is 3 characters long.

              6. Verbose Output:
              To see which files are being created during the split operation:

                Bash
                split --verbose INPUT PREFIX

                • This option will print the name of each output file as it is being created.

                7. Split from a Specific Starting Point:
                If you want to start splitting a file from a specific location:

                  Bash
                  split -C SIZE INPUT PREFIX

                  • This ensures that no chunk will be larger than SIZE and split lines properly.
                  • Example: split -C 1M largefile part_ ensures each file is no larger than 1MB and doesn’t split inside lines.

                  8. Round Robin Split:
                  If you want to split a file by distributing lines in a round-robin manner across several output files:

                    Bash
                    split --number=l/N INPUT PREFIX

                    • N specifies how many files you want to split into, and l is the method for distributing lines.
                    • Example: split --number=l/3 inputfile part_ distributes the lines of inputfile across 3 files in a round-robin fashion.

                    Examples:

                    1. Split a file into chunks of 1 MB each:
                    Bash
                    split -b 1M largefile part_

                    This splits largefile into chunks of 1 MB each, with file names starting from part_aa, part_ab, and so on.

                    1. Split a file into files with 500 lines each:
                    Bash
                    split -l 500 inputfile segment_

                    This splits inputfile into multiple files, each containing 500 lines, with names like segment_aa, segment_ab, etc.

                    1. Split a file into 4 equal parts:
                    Bash
                    split -n 4 inputfile chunk_

                    This will split inputfile into 4 equal-sized files.

                    Handling Binary Files:

                    If you’re working with binary files and want to ensure they’re split correctly without data corruption, you can still use the -b option. Example:

                    Bash
                    split -b 512k binaryfile binpart_

                    This will split a binary file into 512 KB chunks.

                    Recombining Split Files:

                    To reassemble the split files back into one, use the cat command:

                    Bash
                    cat part_* > combined_file

                    This will concatenate the files in the order they were created and restore them into a single file.

                    Conclusion:

                    The split command is a powerful tool for dividing files into smaller pieces based on size, line count, or number of output files. It offers flexibility with custom prefixes, suffix lengths, and verbose output for ease of use. It’s commonly used when managing large datasets, splitting logs, or distributing files across systems.

                    Share
                    OpenLib .

                    OpenLib .

                    The Founder - OpenLib.io

                    You may also like...

                    Leave a Reply

                    Your email address will not be published. Required fields are marked *