🔥

CodeIgniter4 特定の文字列を含む CSV が ext_in をパスできず PhpSpreadsheet でも load できない

2024/02/01に公開

実行環境

  • PHP 7.4.33
  • CodeIgniter 4.4.5
  • PhpSpreadsheet 1.29.0

問題の発生した CSVファイル

今回ケースでは CSVファイルの先頭文字が「契」16進数表記で 8C5F の場合に発生した。

  • 文字コード SJIS
  • 入力改行コード CRLF

検証ルール設定

app\Config\Validation.php
public array $valid_upload = [
    'field_name' => [
        'rules' => 'uploaded[field_name]|ext_in[field_name,csv]',
    ],
];

調査

フレームワーク、ライブラリのコア部分をひたすら追っていく。

vendor\codeigniter4\framework\system\Validation\FileRules.php
public function ext_in(?string $blank, string $params): bool
{
    $params = explode(',', $params);
    $name   = array_shift($params);

    if (! ($files = $this->request->getFileMultiple($name))) {
        $files = [$this->request->getFile($name)];
    }

    foreach ($files as $file) {
        if ($file === null) {
            return false;
        }

        if ($file->getError() === UPLOAD_ERR_NO_FILE) {
            return true;
        }

        if (! in_array($file->guessExtension(), $params, true)) {
            return false;
        }
    }

    return true;
}
vendor\codeigniter4\framework\system\HTTP\Files\UploadedFile.php
public function guessExtension(): string
{
    return Mimes::guessExtensionFromType($this->getMimeType(), $this->getClientExtension()) ?? '';
}
vendor\codeigniter4\framework\system\Files\File.php
public function getMimeType(): string
{
    if (! function_exists('finfo_open')) {
        return $this->originalMimeType ?? 'application/octet-stream'; // @codeCoverageIgnore
    }

    $finfo    = finfo_open(FILEINFO_MIME_TYPE);
    $mimeType = finfo_file($finfo, $this->getRealPath() ?: $this->__toString());
    finfo_close($finfo);

    return $mimeType;
}

正常時 getMimeType メソッドは text/plain を返却する。
今回のケースでは application/x-dosexec が返却されており guessExtension メソッドは exe を返却、結果バリデーションエラーが発生していた。

続いて PhpSpreadsheet を利用した CSV 読込処理。

app\Libraries\CsvReader.php
namespace App\Libraries;

use Exception;
use PhpOffice\PhpSpreadsheet\Reader\Csv;
use RuntimeException;

class CsvReader
{
    /**
     * @param $file
     * @return array
     */
    public function read($file): array
    {
        if ($file->isValid() && !$file->hasMoved()) {
            $reader = new Csv();
            $reader->setInputEncoding('SJIS');
            try {
                $spreadsheet = $reader->load($file);
            } catch (Exception $e) {
                throw new RuntimeException($e->getMessage());
            }
            return $spreadsheet->getActiveSheet()->toArray();
        }
        throw new RuntimeException($file->getErrorString());
    }
}

PhpSpreadsheet の load メソッドの実行時に例外 is an Invalid Spreadsheet file.
発生した原因を突き止める。

vendor\phpoffice\phpspreadsheet\src\PhpSpreadsheet\Reader\BaseReader.php
public function load(string $filename, int $flags = 0): Spreadsheet
{
    $this->processFlags($flags);

    try {
        return $this->loadSpreadsheetFromFile($filename);
    } catch (ReaderException $e) {
        throw $e;
    }
}
vendor\phpoffice\phpspreadsheet\src\PhpSpreadsheet\Reader\Csv.php
protected function loadSpreadsheetFromFile(string $filename): Spreadsheet
{
    // Create new Spreadsheet
    $spreadsheet = new Spreadsheet();

    // Load into this instance
    return $this->loadIntoExisting($filename, $spreadsheet);
}
vendor\phpoffice\phpspreadsheet\src\PhpSpreadsheet\Reader\Csv.php
public function loadIntoExisting(string $filename, Spreadsheet $spreadsheet): Spreadsheet
{
    return $this->loadStringOrFile($filename, $spreadsheet, false);
}
vendor\phpoffice\phpspreadsheet\src\PhpSpreadsheet\Reader\Csv.php
private function loadStringOrFile(string $filename, Spreadsheet $spreadsheet, bool $dataUri): Spreadsheet
{
    // Deprecated in Php8.1
    $iniset = $this->setAutoDetect('1');

    // Open file
    if ($dataUri) {
        $this->openDataUri($filename);
    } else {
        $this->openFileOrMemory($filename);
    }
    $fileHandle = $this->fileHandle;

    // Skip BOM, if any
    $this->skipBOM();
    $this->checkSeparator();
    $this->inferSeparator();

    // Create new PhpSpreadsheet object
    while ($spreadsheet->getSheetCount() <= $this->sheetIndex) {
        $spreadsheet->createSheet();
    }
    $sheet = $spreadsheet->setActiveSheetIndex($this->sheetIndex);

    // Set our starting row based on whether we're in contiguous mode or not
    $currentRow = 1;
    $outRow = 0;

    // Loop through each line of the file in turn
    $rowData = fgetcsv($fileHandle, 0, $this->delimiter ?? '', $this->enclosure, $this->escapeCharacter);
    $valueBinder = Cell::getValueBinder();
    $preserveBooleanString = method_exists($valueBinder, 'getBooleanConversion') && $valueBinder->getBooleanConversion();
    while (is_array($rowData)) {
        $noOutputYet = true;
        $columnLetter = 'A';
        foreach ($rowData as $rowDatum) {
            $this->convertBoolean($rowDatum, $preserveBooleanString);
            $numberFormatMask = $this->convertFormattedNumber($rowDatum);
            if (($rowDatum !== '' || $this->preserveNullString) && $this->readFilter->readCell($columnLetter, $currentRow)) {
                if ($this->contiguous) {
                    if ($noOutputYet) {
                        $noOutputYet = false;
                        ++$outRow;
                    }
                } else {
                    $outRow = $currentRow;
                }
                // Set basic styling for the value (Note that this could be overloaded by styling in a value binder)
                $sheet->getCell($columnLetter . $outRow)->getStyle()
                    ->getNumberFormat()
                    ->setFormatCode($numberFormatMask);
                // Set cell value
                $sheet->getCell($columnLetter . $outRow)->setValue($rowDatum);
            }
            ++$columnLetter;
        }
        $rowData = fgetcsv($fileHandle, 0, $this->delimiter ?? '', $this->enclosure, $this->escapeCharacter);
        ++$currentRow;
    }

    // Close file
    fclose($fileHandle);

    $this->setAutoDetect($iniset);

    // Return
    return $spreadsheet;
}
vendor\phpoffice\phpspreadsheet\src\PhpSpreadsheet\Reader\Csv.php
private function openFileOrMemory(string $filename): void
{
    // Open file
    $fhandle = $this->canRead($filename);
    if (!$fhandle) {
        throw new Exception($filename . ' is an Invalid Spreadsheet file.');
    }
    if ($this->inputEncoding === self::GUESS_ENCODING) {
        $this->inputEncoding = self::guessEncoding($filename, $this->fallbackEncoding);
    }
    $this->openFile($filename);
    if ($this->inputEncoding !== 'UTF-8') {
        fclose($this->fileHandle);
        $entireFile = file_get_contents($filename);
        $fileHandle = fopen('php://memory', 'r+b');
        if ($fileHandle !== false && $entireFile !== false) {
            $this->fileHandle = $fileHandle;
            $data = StringHelper::convertEncoding($entireFile, 'UTF-8', $this->inputEncoding);
            fwrite($this->fileHandle, $data);
            $this->skipBOM();
        }
    }
}
vendor\phpoffice\phpspreadsheet\src\PhpSpreadsheet\Reader\Csv.php
public function canRead(string $filename): bool
{
    // Check if file exists
    try {
        $this->openFile($filename);
    } catch (ReaderException $e) {
        return false;
    }

    fclose($this->fileHandle);

    // Trust file extension if any
    $extension = strtolower(/** @scrutinizer ignore-type */ pathinfo($filename, PATHINFO_EXTENSION));
    if (in_array($extension, ['csv', 'tsv'])) {
        return true;
    }

    // Attempt to guess mimetype
    $type = mime_content_type($filename);
    $supportedTypes = [
        'application/csv',
        'text/csv',
        'text/plain',
        'inode/x-empty',
    ];

    return in_array($type, $supportedTypes, true);
}

MIME 判定に使用されている mime_content_type 関数が application/x-dosexec を返却しており
例外が発生していた。

対応内容

今回は検証ルール ext_in を使用せずに独自実装で対応。
PHPSpreadsheet に関しては Trust file extension if any の箇所で逃げられるよう CSV 読込の実装を以下のように変更。

app\Libraries\CsvReader.php
namespace App\Libraries;

use Exception;
use PhpOffice\PhpSpreadsheet\Reader\Csv;
use RuntimeException;

class CsvReader
{
    /**
     * @param $file
     * @return array
     */
    public function read($file): array
    {
        if ($file->isValid() && !$file->hasMoved()) {
            $new_name = pathinfo($file->getRandomName(), PATHINFO_FILENAME) . '.csv';
            $file->move(WRITEPATH . 'cache', $new_name);
            $reader = new Csv();
            $reader->setInputEncoding('SJIS');
            try {
                $spreadsheet = $reader->load(WRITEPATH . 'cache' . DIRECTORY_SEPARATOR . $new_name);
            } catch (Exception $e) {
                throw new RuntimeException($e->getMessage());
            }
            return $spreadsheet->getActiveSheet()->toArray();
        }
        throw new RuntimeException($file->getErrorString());
    }
}

参考

Discussion