🍑

[PHP]ファイル比較 -片方のファイルのみに含まれるデータを抽出する -

2022/03/03に公開1

概要

2つのファイルを比較し、片方のファイル(比較元)に含まれるデータを抜き出すコード。
1要素ごとに改行して出力される

コード

diffFileData.php
<?php

// 比較元, 比較対象のファイルを読み込み、配列に格納
$fileOrigin = file($argv[1]); // 比較元ファイル
$fileCompare = file($argv[2]); // 比較対象ファイル

// 比較元のみに含まれるデータを取得
$resultArray = array_diff($fileOrigin, $fileCompare);

// 結果出力用に成形(1要素ごとに改行して出力)
$resultData = '';
foreach ($resultArray as $data) {
    $resultData .= $data;
}

// 結果出力用ファイル
$resultFile = 'only_in_' . $argv[1];

// 結果を出力
file_put_contents($resultFile, $resultData);

実行方法

php diffFileData.php [比較元ファイル] [比較対象ファイル]
# [例]比較元ファイルがfile1.txt, 比較対象ファイルがfile2.txtの場合
php diffFileData.php file1.txt file2.txt

補足事項

上記の例では、比較元、比較対象ファイル両方をPHPプログラムと同じ階層に置いて実行した場合

参考

Discussion

andersenlab1andersenlab1

In my opinion, the provided PHP script offers a handy solution for comparing two files and extracting unique data from the comparison source file. This functionality can be particularly useful when you're working with data integration or data cleansing tasks.
The script appears to be well-structured, making use of PHP's built-in functions for efficient file handling and data manipulation. Quality assurance services can find this code favorable due to its clarity and reliance on standard PHP functions, which should contribute to its robustness and reliability: https://andersenlab.com/services/quality-assurance-services
Here's a brief overview of the code:

  1. The script starts by reading the contents of both the comparison source and target files and stores them as arrays. This initial step is crucial for subsequent data comparison.
  2. It then employs the array_diff function to identify and extract data present only in the comparison source file. This function is a powerful tool for efficiently finding differences between arrays.
  3. The extracted data is nicely formatted for output, with each item separated by a line break. This makes the results easy to read and work with.
  4. The code generates a result file name based on the comparison source file. This ensures that the results are organized and easily identifiable.
  5. It uses the file_put_contents function to write the extracted data to the result file. This function simplifies the process of writing data to a file in PHP.
  6. To execute the script, you need to run it from the command line, specifying the comparison source and target files as arguments. This command-line approach allows for flexibility and automation when working with different file pairs.
    Regarding your question about executing the script, if you have the source and target diff files in the same directory as the PHP program, it's quite convenient. You can simply navigate to the directory containing the PHP program and run the provided command, as mentioned earlier. The script will use the provided file paths (file1.txt and file2.txt) to perform the comparison and generate the result file. This setup makes it easy to manage and execute the script as part of your data quality assurance processes.