GenAI-Tutorial
GenAI-Tutorial
Google Colaboratoryでやります。
GenAIをインストールする
pip install genai
エラー発生
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-colab 1.0.0 requires ipython==7.34.0, but you have ipython 8.13.2 which is incompatible.
Successfully installed aiohttp-3.8.4 aiosignal-1.3.1 asttokens-2.2.1 async-timeout-4.0.2 executing-1.2.0 frozenlist-1.3.3 genai-2.0.0 ipython-8.13.2 jedi-0.18.2 multidict-6.0.4 openai-0.27.7 pure-eval-0.2.2 stack-data-0.6.2 tabulate-0.9.0 tiktoken-0.3.3 vdom-0.6 yarl-1.9.2
WARNING: Upgrading ipython, ipykernel, tornado, prompt-toolkit or pyzmq can
cause your runtime to repeatedly crash or behave in unexpected ways and is not
recommended. If your runtime won't connect or execute code, you can reset it
with "Disconnect and delete runtime" from the "Runtime" menu.
WARNING: The following packages were previously imported in this runtime:
[IPython]
You must restart the runtime in order to use newly installed versions.
このエラーメッセージは、Google Colaboratoryの実行環境でpipを使用してパッケージをインストールする際に、依存関係の競合が発生していることを示しています。具体的には、google-colabパッケージがインストールされている状態で、pipがipythonパッケージを8.13.2のバージョンでインストールしようとしていますが、google-colabパッケージが7.34.0のバージョンのipythonを要求しているため、競合が発生しています。
ランタイムを再起動し、もう一度実行します。
pip install genai
OPENAI_API_KEYを取得する
https://platform.openai.com/account/api-keys
ここからChatGPTのAPI Keyを作成します。
鍵の管理には、十分に注意してください。
OPEN_API_KEYを環境変数として設定する
import os
os.environ['OPENAI_API_KEY'] = 'YOUR_API_KEY'
Colaboratoryセッションが終了すると、環境変数の設定も失われます。したがって、Colaboratoryを再度開く際には、APIキーを再設定する必要があります。
IPythonエクステンションの読み込む
%load_ext genai
Features
- 自然言語からコードを生成する
%%assist
magic command - Custom exception suggestions
Custom Exception Suggestions
import pandas as pd
df = pd.DataFrame(dict(col1=['a', 'b', 'c']), index=['first', 'second', 'third'])
df.sort_values()
↓実行結果
TypeError Traceback (most recent call last)
Cell In[5], line 5
1 import pandas as pd
3 df = pd.DataFrame(dict(col1=['a', 'b', 'c']), index=['first', 'second', 'third'])
----> 5 df.sort_values()
File /usr/local/lib/python3.10/dist-packages/pandas/util/_decorators.py:331, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
325 if len(args) > num_allow_args:
326 warnings.warn(
327 msg.format(arguments=_format_argument_list(allow_args)),
328 FutureWarning,
329 stacklevel=find_stack_level(),
330 )
--> 331 return func(*args, **kwargs)
TypeError: DataFrame.sort_values() missing 1 required positional argument: 'by'
↓AIが解決方法と実装例を提案してくれる
Suggestion
The error message shows that thesort_values()
method is missing a required argument 'by'. You need to specify the column by which you want to sort the DataFrame. Try updating your code like this:
import pandas as pd
df = pd.DataFrame(dict(col1=['a', 'b', 'c']), index=['first', 'second', 'third'])
df_sorted = df.sort_values(by='col1')
This sorts the DataFrame by the 'col1' column and assigns it to a new variable
df_sorted
.
Example
%%assist
Can you explain this query to me so I can be sure we're doing the right things?
%%sql
SELECT
COUNT(*) AS num_downloads,
DATE_TRUNC(DATE(timestamp), DAY) AS day
FROM `bigquery-public-data.pypi.file_downloads`
WHERE
file.project = 'genai'
-- Only query the last 6 months of history
AND DATE(timestamp)
BETWEEN DATE_TRUNC(DATE_SUB(CURRENT_DATE(), INTERVAL 6 MONTH), MONTH)
AND CURRENT_DATE()
GROUP BY day
ORDER BY day DESC
↓実行結果
This SQL query retrieves the number of downloads of all files associated with the project
genai
from the publicpypi
dataset in BigQuery over the last 6 months. It then groups these downloads by day and orders them in descending order.
Here's a step-by-step breakdown of the query:
SELECT COUNT(*) AS num_downloads
: selects and counts all of the rows in the table and gives the resulting column an alias ofnum_downloads
.
DATE_TRUNC(DATE(timestamp), DAY) AS day
: truncates the timestamp down to the day and labels this resulting column asday
.
FROM
bigquery-public-data.pypi.file_downloads``: specifies the table to query.
WHERE file.project = 'genai'
: filters only the files that are associated with thegenai
project.
AND DATE(timestamp) BETWEEN DATE_TRUNC(DATE_SUB(CURRENT_DATE(), INTERVAL 6 MONTH), MONTH) AND CURRENT_DATE()
: filters only the files that were downloaded in the last 6 months. This uses several date functions to retrieve the date range:DATE_SUB(CURRENT_DATE(), INTERVAL 6 MONTH)
calculates the date 6 months ago,DATE_TRUNC(..., MONTH)
returns the first day of the month for a given date, and thenDATE(...)
casts that to a date value to match the datatype of thetimestamp
field. Lastly,BETWEEN
is used to select all records where the timestamp is between the truncated date 6 months ago and the current date.
GROUP BY day
: groups the resulting count of downloads per day together to create a summary.
ORDER BY day DESC
: orders the results by the day in descending order so that the most recent day is at the top.
Discussion