😺

pyscnを利用して綺麗なコードを目指してみよう

に公開1

今回はpyscnを使ってPythonコードの状態を分析する方法を調べてみました。

pyscnとは?

pyscnはPython Code Quality Analyzerということで、Pythonコードの品質を検知するためのツールになります。特徴としては以下があるようです。

  • CFGベースのデッドコード検知: if/elseなどで到達しないコードがあるか検知
  • APTEDとLSHによるクローン検知: ツリー編集距離によるリファクタリング対象の特定
  • Coupling metrics (CBO): アーキテクチャの品質及びモジュールの依存関係の追跡
  • 関数の複雑さ分析: 分解が必要な関数を発見する

要は実装されているが不必要なものであったり、必要以上に複雑になってしまっている部分を発見したりできるということです。より詳しくは公式GitHubをぜひご覧ください。

https://github.com/ludo-technologies/pyscn

早速使ってみる

インストール

インストール方法は以下が提供されています。私はuvを利用してツールとしてインストールしました。

# Install with pipx (recommended)
pipx install pyscn

# Or with uv
uv tool install pyscn

検知を実行してみる!

それでは早速検知してみます。今回は昨日公開したcuMLとscikit-learnの比較をした記事で利用したコードを対象にしてみます。以下のコードではアクセスされる場所はない、カバレッジは100%になるようなコードになっています。

cuml_sklearn_comp.py
import time
from tqdm import tqdm
import matplotlib.pyplot as plt
from sklearn.svm import SVC as SKSVC
from cuml.svm import SVC as CUMLSVC
from sklearn.neighbors import KNeighborsClassifier as SKKNeighborsClassifier
from cuml.neighbors import KNeighborsClassifier as CUMLKNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier as SKRandomForestClassifier
from cuml.ensemble import RandomForestClassifier as CUMLRandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.metrics import accuracy_score
from cuml.model_selection import train_test_split


def create_dataset(n_samples, n_features, n_classes):
    X, y = make_classification(
        n_classes=n_classes, n_features=n_features, n_samples=n_samples, random_state=0
    )

    X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
    return X_train, X_test, y_train, y_test


params = {"max_depth": 10}
skrf = SKRandomForestClassifier(**params)
cumlrf = CUMLRandomForestClassifier(**params)

params = {"n_neighbors": 10}
skknn = SKKNeighborsClassifier(**params)
cumlknn = SKKNeighborsClassifier(**params)


params = {"C": 1.1}
sksvc = SKSVC(**params)
cumlsvc = CUMLSVC(**params)


elapsed_times = {
    "rf": {"sklearn": [], "cuml": []},
    "knn": {"sklearn": [], "cuml": []},
    "svc": {"sklearn": [], "cuml": []},
}
accuracies = {
    "rf": {"sklearn": [], "cuml": []},
    "knn": {"sklearn": [], "cuml": []},
    "svc": {"sklearn": [], "cuml": []},
}
n_samples = (1_000, 10_000, 100_000)
model_params = {
    "rf": {"max_depth": 5},
    "knn": {"n_neighbors": 10},
    "svc": {"C": 1.1},
}
model_classes = {
    "rf": {
        "sklearn": SKRandomForestClassifier,
        "cuml": CUMLRandomForestClassifier,
    },
    "knn": {
        "sklearn": SKKNeighborsClassifier,
        "cuml": CUMLKNeighborsClassifier,
    },
    "svc": {
        "sklearn": SKSVC,
        "cuml": CUMLSVC,
    },
}

for _n_samples in tqdm(n_samples):
    X_train, X_test, y_train, y_test = create_dataset(
        _n_samples, n_features=10, n_classes=2
    )

    for model_name, _model_params in tqdm(model_params.items(), leave=False):
        sklearn_model = model_classes[model_name]["sklearn"](**_model_params)
        cuml_model = model_classes[model_name]["cuml"](**_model_params)

        # sklearnでモデル学習
        st = time.time()
        sklearn_model.fit(X_train, y_train)
        predict = sklearn_model.predict(X_test)
        accuracy = accuracy_score(y_test, predict)
        et = time.time()
        accuracies[model_name]["sklearn"].append(accuracy)
        elapsed_times[model_name]["sklearn"].append(et - st)

        # cuMLでモデル学習
        st = time.time()
        cuml_model.fit(X_train, y_train)
        predict = cuml_model.predict(X_test)
        accuracy = accuracy_score(y_test, predict)
        et = time.time()
        accuracies[model_name]["cuml"].append(accuracy)
        elapsed_times[model_name]["cuml"].append(et - st)


for i, model_name in enumerate(model_params.keys()):
    plt.subplot(3, 2, 2 * i + 1)
    plt.plot(n_samples, elapsed_times[model_name]["cuml"], color="red", label="cuML")
    plt.plot(
        n_samples, elapsed_times[model_name]["sklearn"], color="blue", label="sklearn"
    )
    plt.xlabel("n_samples")
    plt.ylabel("elapsed time[s]")
    plt.legend()
    plt.title(model_name)
    plt.subplot(3, 2, 2 * i + 2)
    plt.plot(n_samples, accuracies[model_name]["cuml"], color="red", label="cuML")
    plt.plot(
        n_samples, accuracies[model_name]["sklearn"], color="blue", label="sklearn"
    )
    plt.xlabel("n_samples")
    plt.ylabel("Accuracy")
    plt.legend()
    plt.title(model_name)
    plt.subplots_adjust(wspace=0.5)

plt.savefig("output.png")

pyscnを用いて分析をするにはpyscn analyze <option> .の形式で実行できます。それでは早速今回のコードを実行してみます。

pyscn analyze cuml_sklearn_comp.py

# 結果
Analyzing 100% |█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| (100/100, 18198 it/s)
📊 Unified HTML report generated and opened: /Users/user/Documents/Blog/blog_materials/python_lib/.pyscn/reports/analyze_20251118_221916.html

📊 Analysis Summary:
Health Score: 98/100 (Grade: A)
Total time: 5ms

📈 Detailed Scores:
  Complexity:     100/100 ✅  (avg: 0.0, high-risk: 0 functions)
  Dead Code:      100/100 ✅  (0 issues, 0 critical)
  Duplication:    100/100 ✅  (0.0% duplication, 0 groups)
  Coupling (CBO): 100/100 ✅  (avg: 0.0, 0/0 high-coupling)
  Dependencies:    83/100 👍  (no cycles, depth: 0)
  Architecture:   100/100 ✅  (100% compliant)

✅ Complexity Analysis: 0 functions analyzed
✅ Dead Code Detection: Completed
✅ Clone Detection: Completed
✅ Class Coupling: 0 classes analyzed
✅ System Analysis: 1 modules analyzed

なお、こちらを実行すると実行後にブラウザが表示され、レポートが表示されます(/Users/user/Documents/Blog/blog_materials/python_lib/.pyscn/reports/analyze_20251118_221916.htmlにファイルが保存されていることがログからもわかります)。


結果を見ると、Dependenciesについてはいくつかスコアが落ちているものの、グレードとしてはAと評価されています。

分析のためのオプション

pyscn analyzeコマンドでは以下のオプションを利用でき、出力形式を変更したりできます。

  • --json: 出力形式をJSONにする
  • --select: 検知対象の項目を設定する(例えばComplexityを指定するとComplexityだけ検知対象となる)

先ほどの例をJSON形式で実行すると以下のようになります。

pyscn analyze --json cuml_sklearn_comp.py

なお、出力された画面は同じで、生成されるJSONファイルは以下のようになります。

分析結果
{
  "complexity": {
    "Functions": [],
    "Summary": {
      "TotalFunctions": 0,
      "AverageComplexity": 0,
      "MaxComplexity": 0,
      "MinComplexity": 0,
      "FilesAnalyzed": 1,
      "LowRiskFunctions": 0,
      "MediumRiskFunctions": 0,
      "HighRiskFunctions": 0,
      "ComplexityDistribution": null
    },
    "Warnings": null,
    "Errors": null,
    "GeneratedAt": "2025-11-18T22:24:47+09:00",
    "Version": "1.1.1",
    "Config": {
      "exclude_patterns": [
        "test_*.py",
        "*_test.py"
      ],
      "include_patterns": [
        "**/*.py"
      ],
      "low_threshold": 9,
      "max_complexity": 0,
      "medium_threshold": 19,
      "min_complexity": 5,
      "output_format": "json",
      "recursive": false,
      "show_details": false,
      "sort_by": "complexity"
    }
  },
  "dead_code": {
    "files": null,
    "summary": {
      "total_files": 1,
      "total_functions": 0,
      "total_findings": 0,
      "files_with_dead_code": 0,
      "functions_with_dead_code": 0,
      "critical_findings": 0,
      "warning_findings": 0,
      "info_findings": 0,
      "findings_by_reason": {},
      "total_blocks": 0,
      "dead_blocks": 0,
      "overall_dead_ratio": 0
    },
    "warnings": null,
    "errors": null,
    "generated_at": "2025-11-18T22:24:47+09:00",
    "version": "1.1.1",
    "config": {
      "context_lines": 0,
      "detect_after_break": false,
      "detect_after_continue": false,
      "detect_after_raise": false,
      "detect_after_return": false,
      "detect_unreachable_branches": false,
      "exclude_patterns": [
        "test_*.py",
        "*_test.py"
      ],
      "ignore_patterns": [],
      "include_patterns": [
        "**/*.py"
      ],
      "min_severity": "warning",
      "show_context": false,
      "sort_by": "severity"
    }
  },
  "clone": {
    "clones": [
      {
        "id": 1,
        "type": 0,
        "location": {
          "file_path": "cuml_sklearn_comp.py",
          "start_line": 69,
          "end_line": 94,
          "start_col": 0,
          "end_col": 57
        },
        "hash": "",
        "size": 19,
        "line_count": 26,
        "complexity": 0
      },
      {
        "id": 2,
        "type": 0,
        "location": {
          "file_path": "cuml_sklearn_comp.py",
          "start_line": 74,
          "end_line": 94,
          "start_col": 4,
          "end_col": 57
        },
        "hash": "",
        "size": 17,
        "line_count": 21,
        "complexity": 0
      },
      {
        "id": 3,
        "type": 0,
        "location": {
          "file_path": "cuml_sklearn_comp.py",
          "start_line": 97,
          "end_line": 116,
          "start_col": 0,
          "end_col": 35
        },
        "hash": "",
        "size": 16,
        "line_count": 20,
        "complexity": 0
      }
    ],
    "clone_pairs": null,
    "clone_groups": null,
    "statistics": {
      "total_clones": 3,
      "total_clone_pairs": 0,
      "total_clone_groups": 0,
      "clones_by_type": {},
      "average_similarity": 0,
      "lines_analyzed": 119,
      "files_analyzed": 1
    },
    "request": {
      "paths": [
        "cuml_sklearn_comp.py"
      ],
      "recursive": false,
      "include_patterns": [
        "**/*.py"
      ],
      "exclude_patterns": [
        "test_*.py",
        "*_test.py"
      ],
      "min_lines": 5,
      "min_nodes": 10,
      "similarity_threshold": 0.8,
      "max_edit_distance": 0,
      "ignore_literals": false,
      "ignore_identifiers": false,
      "type1_threshold": 0.95,
      "type2_threshold": 0.85,
      "type3_threshold": 0.8,
      "type4_threshold": 0.75,
      "output_format": "json",
      "output_path": "",
      "no_open": false,
      "show_details": false,
      "show_content": false,
      "sort_by": "",
      "group_clones": true,
      "group_mode": "",
      "group_threshold": 0,
      "k_core_k": 0,
      "min_similarity": 0,
      "max_similarity": 1,
      "clone_types": [
        1,
        2,
        3,
        4
      ],
      "config_path": "",
      "timeout": 0,
      "lsh_enabled": "auto",
      "lsh_auto_threshold": 500,
      "lsh_similarity_threshold": 0.5,
      "lsh_bands": 32,
      "lsh_rows": 4,
      "lsh_hashes": 128
    },
    "duration_ms": 8,
    "success": true
  },
  "cbo": {
    "Classes": [],
    "Summary": {
      "TotalClasses": 0,
      "AverageCBO": 0,
      "MaxCBO": 0,
      "MinCBO": 0,
      "ClassesAnalyzed": 0,
      "FilesAnalyzed": 1,
      "LowRiskClasses": 0,
      "MediumRiskClasses": 0,
      "HighRiskClasses": 0,
      "CBODistribution": null,
      "MostCoupledClasses": null,
      "MostDependedUponClasses": null
    },
    "Warnings": [
      "[cuml_sklearn_comp.py] No classes found in file",
      "No classes found to analyze"
    ],
    "Errors": null,
    "GeneratedAt": "2025-11-18T22:24:47+09:00",
    "Version": "1.1.1",
    "Config": {
      "includeBuiltins": false,
      "includeImports": false,
      "lowThreshold": 5,
      "maxCBO": 0,
      "mediumThreshold": 10,
      "minCBO": 0,
      "outputFormat": "json",
      "showZeros": false,
      "sortBy": "coupling"
    }
  },
  "system": {
    "DependencyAnalysis": {
      "TotalModules": 1,
      "TotalDependencies": 0,
      "RootModules": [
        "python_lib.cuml_sklearn_comp"
      ],
      "LeafModules": [
        "python_lib.cuml_sklearn_comp"
      ],
      "ModuleMetrics": {
        "python_lib.cuml_sklearn_comp": {
          "ModuleName": "python_lib.cuml_sklearn_comp",
          "Package": "python_lib",
          "FilePath": "/Users/user/Documents/Blog/blog_materials/python_lib/cuml_sklearn_comp.py",
          "IsPackage": false,
          "LinesOfCode": 119,
          "FunctionCount": 1,
          "ClassCount": 0,
          "PublicInterface": [
            "create_dataset"
          ],
          "AfferentCoupling": 0,
          "EfferentCoupling": 0,
          "Instability": 0,
          "Abstractness": 0,
          "Distance": 1,
          "Maintainability": 0,
          "TechnicalDebt": 0,
          "RiskLevel": "high",
          "DirectDependencies": null,
          "TransitiveDependencies": null,
          "Dependents": null
        }
      },
      "DependencyMatrix": {
        "python_lib.cuml_sklearn_comp": {}
      },
      "CircularDependencies": {
        "HasCircularDependencies": false,
        "TotalCycles": 0,
        "TotalModulesInCycles": 0,
        "CircularDependencies": null,
        "CycleBreakingSuggestions": null,
        "CoreInfrastructure": null
      },
      "CouplingAnalysis": {
        "AverageCoupling": 0,
        "CouplingDistribution": null,
        "HighlyCoupledModules": null,
        "LooselyCoupledModules": null,
        "AverageInstability": 0,
        "StableModules": null,
        "InstableModules": null,
        "MainSequenceDeviation": 1,
        "ZoneOfPain": [
          "python_lib.cuml_sklearn_comp"
        ],
        "ZoneOfUselessness": [],
        "MainSequence": null
      },
      "LongestChains": null,
      "MaxDepth": 0
    },
    "ArchitectureAnalysis": {
      "ComplianceScore": 1,
      "TotalViolations": 0,
      "TotalRules": 0,
      "LayerAnalysis": {
        "LayersAnalyzed": 0,
        "LayerViolations": [],
        "LayerCoupling": {},
        "LayerCohesion": {},
        "ProblematicLayers": []
      },
      "CohesionAnalysis": null,
      "ResponsibilityAnalysis": null,
      "Violations": [],
      "SeverityBreakdown": {},
      "Recommendations": [],
      "RefactoringTargets": []
    },
    "Summary": {
      "TotalModules": 0,
      "TotalPackages": 0,
      "TotalDependencies": 0,
      "ProjectRoot": "",
      "OverallQualityScore": 0,
      "MaintainabilityScore": 0,
      "ArchitectureScore": 0,
      "ModularityScore": 0,
      "TechnicalDebtHours": 0,
      "AverageCoupling": 0,
      "AverageInstability": 0,
      "CyclicDependencies": 0,
      "ArchitectureViolations": 0,
      "HighRiskModules": 0,
      "CriticalIssues": 0,
      "RefactoringCandidates": 0,
      "ArchitectureImprovements": 0
    },
    "Issues": null,
    "Recommendations": null,
    "Warnings": null,
    "Errors": null,
    "GeneratedAt": "2025-11-18T22:24:47.336038+09:00",
    "Duration": 9,
    "Version": "1.1.1",
    "Config": null
  },
  "summary": {
    "total_files": 1,
    "analyzed_files": 1,
    "skipped_files": 0,
    "complexity_enabled": true,
    "dead_code_enabled": true,
    "clone_enabled": true,
    "cbo_enabled": true,
    "deps_enabled": true,
    "arch_enabled": true,
    "deps_total_modules": 1,
    "deps_modules_in_cycles": 0,
    "deps_max_depth": 0,
    "deps_main_sequence_deviation": 1,
    "arch_compliance": 1,
    "total_functions": 0,
    "average_complexity": 0,
    "high_complexity_count": 0,
    "dead_code_count": 0,
    "critical_dead_code": 0,
    "total_clones": 3,
    "clone_pairs": 0,
    "clone_groups": 0,
    "code_duplication_percentage": 0,
    "cbo_classes": 0,
    "high_coupling_classes": 0,
    "medium_coupling_classes": 0,
    "average_coupling": 0,
    "health_score": 98,
    "grade": "A",
    "complexity_score": 100,
    "dead_code_score": 100,
    "duplication_score": 100,
    "coupling_score": 100,
    "dependency_score": 83,
    "architecture_score": 100
  },
  "generated_at": "2025-11-18T22:24:47.336532+09:00",
  "duration_ms": 12,
  "version": "1.1.1"
}

また、分析項目をComplexityに絞ると以下のような結果になりました。

pyscn analyze --json --select Complexity cuml_sklearn_comp.py

# 結果
Analyzing 100% |█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| (100/100, 13420 it/s)
📊 Unified JSON report generated: /Users/user/Documents/Blog/blog_materials/python_lib/.pyscn/reports/analyze_20251118_222545.json

📊 Analysis Summary:
Health Score: 100/100 (Grade: A)
Total time: 8ms

📈 Detailed Scores:
  Complexity:     100/100 ✅  (avg: 0.0, high-risk: 0 functions)

✅ Complexity Analysis: 0 functions analyzed

まとめ

今回はpyscnを使ってPythonのコード品質の分析をしてみました。簡単にコード分析はできるようになったので、不必要なコード部分を確認したりでき、CIに組み込んだりして品質の良いPythonコードを書けるように頑張っていこうと思いました。

Discussion

AkasanAkasan

本記事を執筆した2025/11/18にてcloudflareの障害によりZennも接続問題が起こっていた影響で、2025/11/18中にGitHubとの連携ができなかった結果、記事は2025/11/19に公開になりましたが、記載自体は連日でしていた記録は継続扱いとします(私のせいではありませんでしたのでw)