👋

GenAI-Tutorial

2023/06/01に公開

GenAI-Tutorial

genai|PyPI

Google Colaboratoryでやります。

GenAIをインストールする

pip install genai

エラー発生

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-colab 1.0.0 requires ipython==7.34.0, but you have ipython 8.13.2 which is incompatible.
Successfully installed aiohttp-3.8.4 aiosignal-1.3.1 asttokens-2.2.1 async-timeout-4.0.2 executing-1.2.0 frozenlist-1.3.3 genai-2.0.0 ipython-8.13.2 jedi-0.18.2 multidict-6.0.4 openai-0.27.7 pure-eval-0.2.2 stack-data-0.6.2 tabulate-0.9.0 tiktoken-0.3.3 vdom-0.6 yarl-1.9.2
WARNING: Upgrading ipython, ipykernel, tornado, prompt-toolkit or pyzmq can
cause your runtime to repeatedly crash or behave in unexpected ways and is not
recommended. If your runtime won't connect or execute code, you can reset it
with "Disconnect and delete runtime" from the "Runtime" menu.
WARNING: The following packages were previously imported in this runtime:
  [IPython]
You must restart the runtime in order to use newly installed versions.

このエラーメッセージは、Google Colaboratoryの実行環境でpipを使用してパッケージをインストールする際に、依存関係の競合が発生していることを示しています。具体的には、google-colabパッケージがインストールされている状態で、pipがipythonパッケージを8.13.2のバージョンでインストールしようとしていますが、google-colabパッケージが7.34.0のバージョンのipythonを要求しているため、競合が発生しています。

ランタイムを再起動し、もう一度実行します。

pip install genai

OPENAI_API_KEYを取得する

https://platform.openai.com/account/api-keys

ここからChatGPTのAPI Keyを作成します。

鍵の管理には、十分に注意してください。

OPEN_API_KEYを環境変数として設定する

import os
os.environ['OPENAI_API_KEY'] = 'YOUR_API_KEY'

Colaboratoryセッションが終了すると、環境変数の設定も失われます。したがって、Colaboratoryを再度開く際には、APIキーを再設定する必要があります。

IPythonエクステンションの読み込む

%load_ext genai

Features

  • 自然言語からコードを生成する%%assist magic command
  • Custom exception suggestions

Custom Exception Suggestions

import pandas as pd

df = pd.DataFrame(dict(col1=['a', 'b', 'c']), index=['first', 'second', 'third'])

df.sort_values()

↓実行結果

TypeError                                 Traceback (most recent call last)
Cell In[5], line 5
      1 import pandas as pd
      3 df = pd.DataFrame(dict(col1=['a', 'b', 'c']), index=['first', 'second', 'third'])
----> 5 df.sort_values()

File /usr/local/lib/python3.10/dist-packages/pandas/util/_decorators.py:331, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
    325 if len(args) > num_allow_args:
    326     warnings.warn(
    327         msg.format(arguments=_format_argument_list(allow_args)),
    328         FutureWarning,
    329         stacklevel=find_stack_level(),
    330     )
--> 331 return func(*args, **kwargs)

TypeError: DataFrame.sort_values() missing 1 required positional argument: 'by'

↓AIが解決方法と実装例を提案してくれる

Suggestion
The error message shows that the sort_values() method is missing a required argument 'by'. You need to specify the column by which you want to sort the DataFrame. Try updating your code like this:

import pandas as pd

df = pd.DataFrame(dict(col1=['a', 'b', 'c']), index=['first', 'second', 'third'])

df_sorted = df.sort_values(by='col1')

This sorts the DataFrame by the 'col1' column and assigns it to a new variable df_sorted.

Example

%%assist

Can you explain this query to me so I can be sure we're doing the right things?

%%sql
    SELECT
      COUNT(*) AS num_downloads,
      DATE_TRUNC(DATE(timestamp), DAY) AS day
    FROM `bigquery-public-data.pypi.file_downloads`
    WHERE
      file.project = 'genai'
      -- Only query the last 6 months of history
      AND DATE(timestamp)
        BETWEEN DATE_TRUNC(DATE_SUB(CURRENT_DATE(), INTERVAL 6 MONTH), MONTH)
        AND CURRENT_DATE()
    GROUP BY day
    ORDER BY day DESC

↓実行結果

This SQL query retrieves the number of downloads of all files associated with the project genai from the public pypi dataset in BigQuery over the last 6 months. It then groups these downloads by day and orders them in descending order.
Here's a step-by-step breakdown of the query:
SELECT COUNT(*) AS num_downloads: selects and counts all of the rows in the table and gives the resulting column an alias of num_downloads.
DATE_TRUNC(DATE(timestamp), DAY) AS day: truncates the timestamp down to the day and labels this resulting column as day.
FROMbigquery-public-data.pypi.file_downloads``: specifies the table to query.
WHERE file.project = 'genai': filters only the files that are associated with the genai project.
AND DATE(timestamp) BETWEEN DATE_TRUNC(DATE_SUB(CURRENT_DATE(), INTERVAL 6 MONTH), MONTH) AND CURRENT_DATE(): filters only the files that were downloaded in the last 6 months. This uses several date functions to retrieve the date range: DATE_SUB(CURRENT_DATE(), INTERVAL 6 MONTH) calculates the date 6 months ago, DATE_TRUNC(..., MONTH) returns the first day of the month for a given date, and then DATE(...) casts that to a date value to match the datatype of the timestamp field. Lastly, BETWEEN is used to select all records where the timestamp is between the truncated date 6 months ago and the current date.
GROUP BY day: groups the resulting count of downloads per day together to create a summary.
ORDER BY day DESC: orders the results by the day in descending order so that the most recent day is at the top.

Discussion