🔖

Dataiku DSSのインストール手順 Ubuntu編

2023/05/02に公開

Snowflakeで機械学習を行う手法として、最近Snowpark for pythonの記事が多いですが、Dataikuを使うことで機械学習ができ、かつ他の細々とした素敵な機能も搭載しているので機械学習を始めるハードルは低いのでは無いでしょうか。
というわけで、今回はDataikuのオンプレ版をインストールしてみたいと思います。

Dataikuとは

・Dataikuを使うことで組織内でAIドリブンなアプリケーションを利用できます。

  • データの準備
  • ビジュアル化
  • 機械学習
  • DataOps
  • MLOps
  • 分析アプリ

※これらは一例です

詳しくは公式ドキュメントを参照してほしい

Dataikuを構築した要件

・WindowsマシンのVirtualBox上に構築したUbuntu22.04をベースとしています。
・メモリは8GBを設定
・ディスクは100GBを設定
最初はWindows for Docker上のDockerマシンでインストールできるか試してみましたが、dss: dss supervisor is not runningのエラーが止まらなかったので、VirtualBoxに切り替えました。

公式ドキュメントに沿ってインストール

https://www.dataiku.com/ja/製品/始める/linux/

DSS本体をUbuntu上にダウンロードする

wget https://cdn.downloads.dataiku.com/public/dss/11.3.2/dataiku-dss-11.3.2.tar.gz

最新版のDataikuが欲しい方はこちらから取得できます。

ダウンロードしたDataiku本体を解凍する

tar xzf dataiku-dss-11.3.2.tar.gz

インストール

dataiku@dataiku:~$ dataiku-dss-11.3.2/installer.sh -d DATA_DIR -p 11000
*********************************************
*           Dataiku DSS installer           *
*********************************************
[+] Creating data directory: DATA_DIR
[+] Saving installation log to /home/dataiku/DATA_DIR/run/install.log
[*] Could not find suitable version of Java
[+] Checking required dependencies
+ Detected OS distribution : ubuntu 22.04
+ Checking required packages...
*** Error: package acl not found
*** Error: package libncurses5 not found
*** Error: package nginx not found
*** Error: package unzip not found
*** Error: package zip not found
*** Error: package default-jre-headless not found
*** Error: package python2.7 not found
*** Error: package libpython2.7 not found
*** Error: package libgomp1 not found
*** Error: package python3.7 not found

[-] Dependency check failed
[-] You can install required dependencies with:
[-]    sudo -i "/home/dataiku/dataiku-dss-11.3.2/scripts/install/install-deps.sh"
[-] You can also disable this check with the -n installer flag

このパッケージが無いよ!ってエラーがいっぱい出たので、1個1個インストールはマジ面倒。

パッケージ依存関係の解消

1個1個インストールはしません。便利なコマンドがるのでそれを使います。
では実行してみます。

dataiku@dataiku:~$ sudo -i "/home/dataiku/dataiku-dss-11.3.2/scripts/install/install-deps.sh"
[sudo] password for dataiku:
+ Detected OS distribution : ubuntu 22.04
+ Checking required repositories...
+ Adding 'deadsnakes' PPA repository for python 3.7 ...
Traceback (most recent call last):
  File "/usr/bin/add-apt-repository", line 364, in <module>
    sys.exit(0 if addaptrepo.main() else 1)
  File "/usr/bin/add-apt-repository", line 347, in main
    shortcut = handler(source, **shortcut_params)
  File "/usr/lib/python3/dist-packages/softwareproperties/shortcuts.py", line 40, in shortcut_handler
    return handler(shortcut, **kwargs)
  File "/usr/lib/python3/dist-packages/softwareproperties/ppa.py", line 82, in __init__
    if self.lpppa.publish_debug_symbols:
  File "/usr/lib/python3/dist-packages/softwareproperties/ppa.py", line 120, in lpppa
    self._lpppa = self.lpteam.getPPAByName(name=self.ppaname)
  File "/usr/lib/python3/dist-packages/softwareproperties/ppa.py", line 107, in lpteam
    self._lpteam = self.lp.people(self.teamname)
  File "/usr/lib/python3/dist-packages/softwareproperties/ppa.py", line 98, in lp
    self._lp = login_func("%s.%s" % (self.__module__, self.__class__.__name__),
  File "/usr/lib/python3/dist-packages/launchpadlib/launchpad.py", line 494, in login_anonymously
    return cls(
  File "/usr/lib/python3/dist-packages/launchpadlib/launchpad.py", line 230, in __init__
    super(Launchpad, self).__init__(
  File "/usr/lib/python3/dist-packages/lazr/restfulclient/resource.py", line 472, in __init__
    self._wadl = self._browser.get_wadl_application(self._root_uri)
  File "/usr/lib/python3/dist-packages/lazr/restfulclient/_browser.py", line 447, in get_wadl_application
    response, content = self._request(url, media_type=wadl_type)
  File "/usr/lib/python3/dist-packages/lazr/restfulclient/_browser.py", line 389, in _request
    response, content = self._request_and_retry(
  File "/usr/lib/python3/dist-packages/lazr/restfulclient/_browser.py", line 359, in _request_and_retry
    response, content = self._connection.request(
  File "/usr/lib/python3/dist-packages/httplib2/__init__.py", line 1725, in request
    (response, content) = self._request(
  File "/usr/lib/python3/dist-packages/launchpadlib/launchpad.py", line 144, in _request
    response, content = super(LaunchpadOAuthAwareHttp, self)._request(
  File "/usr/lib/python3/dist-packages/lazr/restfulclient/_browser.py", line 184, in _request
    return super(RestfulHttp, self)._request(
  File "/usr/lib/python3/dist-packages/httplib2/__init__.py", line 1441, in _request
    (response, content) = self._conn_request(conn, request_uri, method, body, headers)
  File "/usr/lib/python3/dist-packages/httplib2/__init__.py", line 1363, in _conn_request
    conn.connect()
  File "/usr/lib/python3/dist-packages/httplib2/__init__.py", line 1153, in connect
    sock.connect((self.host, self.port))
TimeoutError: [Errno 110] Connection timed out

いきなりエラーが・・・
調べてみると、IPv6の設定を無効にすれば解消できるようです。

dataiku@dataiku:~$ sudo sysctl -w net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.all.disable_ipv6 = 1

パッケージ依存関係解消再チャレンジ!

2度ほど対話形式でキー入力を求められるので、都度対応します。
エラー無く進めばOKです。

dataiku@dataiku:~$ sudo -i "/home/dataiku/dataiku-dss-11.3.2/scripts/install/install-deps.sh"
+ Detected OS distribution : ubuntu 22.04
+ Checking required repositories...
+ Adding 'deadsnakes' PPA repository for python 3.7 ...
Repository: 'deb https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu/ jammy main'
~略~
For nightly builds, see ppa:deadsnakes/nightly https://launchpad.net/~deadsnakes/+archive/ubuntu/nightly
More info: https://launchpad.net/~deadsnakes/+archive/ubuntu/ppa
Adding repository.
Press [ENTER] to continue or Ctrl-c to cancel. ←★エンターを押下
~略~
After this operation, 225 MB of additional disk space will be used.
Do you want to continue? [Y/n] Y ←★Yを押下
~略~
Synchronizing state of nginx.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install disable nginx
Removed /etc/systemd/system/multi-user.target.wants/nginx.service.

もう一度インストールを試します

dataiku@dataiku:~$ dataiku-dss-11.3.2/installer.sh -d DATA_DIR -p 11000
*********************************************
*           Dataiku DSS installer           *
*********************************************
[!] *****************************************************
[!] DATA_DIR contains a previously failed install of DSS
[!] Moving it out of the way and proceeding
[!] *****************************************************
[+] Creating data directory: DATA_DIR
[+] Saving installation log to /home/dataiku/DATA_DIR/run/install.log
[+] Using Java at /usr/bin/java : openjdk version "11.0.18" 2023-01-17
[+] Checking required dependencies
+ Detected OS distribution : ubuntu 22.04
+ Checking required packages...
[+] Installation starting
[+] Initializing Python environment
[+] Initializing Python environment using platform default
+ Using default base Python for this platform : python3.7
created virtual environment CPython3.7.16.final.0-64 in 624ms
  creator CPython3Posix(dest=/home/dataiku/DATA_DIR/pyenv, clear=False, no_vcs_ignore=False, global=False)
  seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/home/dataiku/.local/share/virtualenv)
    added seed packages: pip==22.1.2, setuptools==62.6.0, wheel==0.37.1
  activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator
[+] Precompiling Dataiku Python code
[+] Precompiling Jupyter Python code
[+] Precompiling third-party Python 3.7 code
[!] Small RAM detected, using low-memory mode, may not be suitable for production setups
[+] Performing initial install
[+] Writing version metadata conf=11300 product=11.3.2 revision=
[+] Writing default install config file
[+] Writing default env file
[+] Preparing data directory initial data
[2023/05/02-04:47:40.993] [main] [INFO] [dku.logging]  - Loading logging settings
[2023/05/02-04:47:41.012] [main] [INFO] [dku.logging]  - Configuring additional logging settings from /home/dataiku/dataiku-dss-11.3.2/resources/logging/dku-log4j.properties
[2023/05/02-04:47:41.092] [main] [INFO] [dku.logging]  - Configuring additional JUL logging settings from /home/dataiku/dataiku-dss-11.3.2/resources/logging/dku-log-jul.properties
Installed kernelspec python3 in /home/dataiku/DATA_DIR/jupyter-run/jupyter/kernels/python3
Installing /home/dataiku/dataiku-dss-11.3.2/python37.packages/widgetsnbextension/static -> jupyter-js-widgets
Making directory: /home/dataiku/DATA_DIR/jupyter-run/jupyter/nbextensions/jupyter-js-widgets/
Copying: /home/dataiku/dataiku-dss-11.3.2/python37.packages/widgetsnbextension/static/extension.js -> /home/dataiku/DATA_DIR/jupyter-run/jupyter/nbextensions/jupyter-js-widgets/extension.js
Copying: /home/dataiku/dataiku-dss-11.3.2/python37.packages/widgetsnbextension/static/extension.js.map -> /home/dataiku/DATA_DIR/jupyter-run/jupyter/nbextensions/jupyter-js-widgets/extension.js.map
- Validating: OK

    To initialize this nbextension in the browser every time the notebook (or other app) loads:

          jupyter nbextension enable widgetsnbextension --user --py

Extension collapsible_headings/main enabled successfully
Extension codefolding/main enabled successfully
Extension toggle_all_line_numbers/main enabled successfully
Extension hide_input_all/main enabled successfully
Extension addbefore/main enabled successfully
Extension jupyter-js-widgets/extension enabled successfully
[+] Generating default env file
[+] Generating supervisor configuration
[+] Generating nginx configuration
***************************************************************
* Installation complete (DSS node type: design)
* Next, start DSS using:
*         '/home/dataiku/DATA_DIR/bin/dss start'
* Dataiku DSS will be accessible on http://<SERVER ADDRESS>:11000
*
* You can configure Dataiku DSS to start automatically at server boot with:
*    sudo -i "/home/dataiku/dataiku-dss-11.3.2/scripts/install/install-boot.sh" "/home/dataiku/DATA_DIR" dataiku
***************************************************************

Dataikuサービスの起動

dataiku@dataiku:~$ /home/dataiku/DATA_DIR/bin/dss start
Waiting for DSS supervisor to start ...
backend                          STARTING
ipython                          STARTING
nginx                            STARTING
DSS started, pid=17525
Waiting for DSS backend to start ...............

Dataikuサービスの起動ステータスの確認

dataiku@dataiku:~$ /home/dataiku/DATA_DIR/bin/dss status
backend                          RUNNING   pid 17528, uptime 0:13:23
ipython                          RUNNING   pid 17529, uptime 0:13:23
nginx                            RUNNING   pid 17530, uptime 0:13:23
dataiku@dataiku:~$

ブラウザからDataikuにアクセスしてみる

Discussion