Intel DL SDK tool installation failed

Intel DL SDK tool installation failed

Hi,

I installed this tools in two machine and all got one same error when it wants pull docker image dl-training tool from server:

Log:

release-latest: Pulling from intelcorp/dl-training-tool

read tcp 10.239.36.135:37139->10.239.4.160:913: read: connection reset by peer
Error on or near line 395; exiting with status 1

 

Does anyone know what's wrong with my installation steps?

14 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi Zhouhai,

This error usually means that docker image cannot be downloaded due to incorrect proxy settings.

If you use Ubuntu 15.04 and higher, then please check if "/etc/systemd/system/docker.service.d/http-proxy.conf" file exists and contains valid HTTP proxy value. If it doesn't exist, please create it with the following content:

[Service]
Environment=HTTP_PROXY=http://<your_proxy_server>:<port>

After that run "systemctl daemon-reload" command to restart docker service and re-run the install script. It should help.

Thanks,
Alexey

Alexey, Thank you for your help.

Now I can upload data, create data set, Set model, and train data.  All works well.

hi;

i installed the sdk but all time when upload the data its failed
can anyone help me please

Hi Sreehari S.,

Can you please make sure your compressed folder (zip, rar etc.) matches the required format?
(you can see a simple diagram using the link in the upload page)

If it still doesn't work, please attach screenshots of the issue - it will help us to understand it better and focus on the issue.

Thanks!

Barak.

When i upload the file it shows 100% upload then failed...
when I try to upload once more with same file name it shows that the file is already in the location..

Attachments: 

AttachmentSize
Downloadimage/png Untitled.png294.75 KB

Hi,

I am sorry to hear that you are still facing issues with the upload.
The process you mentioned above can still happen if the compressed folder format you uploaded doesn't match the required format.
I suggest the following steps:
1. Make sure you have enough disk space on your machine
2. Please validate that the compressed folder you are trying to upload is valid. I attached the explanation provided in the tool.
3. When trying to upload the file again - please use a new folder name in the "To" field

If you still have issues - please let us know and try to include a screenshot / log / error notification text in your comment.

In addition, our new version that is planned to be released in the next coming days, support better error handling for uploads and all kind of bug fixes. I encourage you to try it out once it is released (you should get an Email about it...)

Thanks,

Barak.

 

Attachments: 

AttachmentSize
Downloadimage/jpeg ImageCompressedFolder.jpg71.1 KB

hi,

 I still stuck on that problem with uploading the data file.
 upload files of different data type(zip,rar,etc). all shows same error.

how much memory needed to run it?..

i run it on my laptop (intel i5, windows 10) and the server is intel xeon processer (ubuntu).
i attach the file that had the screenshot of error. 

Attachments: 

AttachmentSize
Downloadimage/png error screenshot.png162.98 KB
Downloadimage/png error home screen.png156.66 KB

Hi,

The amount of required memory is according to the file size that you are uploading.
I guess that the issue is with the file sub-folders. Please validate again that the file contain only sub-folders (which are the different image classes).
You can validate it by trying to un-rar your file and see that you have more than 1 folder in the RAR file.

If you are still facing issues, please attach a screen-shot of the RAR file inside folders.

Our new version (available at https://software.intel.com/en-us/deep-learning-training-tool) has better error handling and if the issue is caused by the file structure, you will get a notification for that.

Barak.

Hi,

I get this error when using windows installer to install Intel DL SDK to a local Ubuntu 14.04 machine:

And this is the full log:

11:03:43 error docker containers cannot be started 

11:03:43 error error: <password>Error: Command failed: tools\win\plink.exe 115.28.226.116 -l root -pw <password> -batch -t -ssh "chmod +x dlsdk_install_scripts/*.sh && echo 'passwd' | sudo -S -E -k dlsdk_install_scripts/install_training_tool.sh -stage 5 -type multi -username 'root' -toolpassword '<password>' -startport 31000 -selfsigncertificate 'undefined' -volume ~/ "  

11:03:43 info  

11:03:43 info Error on or near line 84; exiting with status 1 

11:03:43 info !!! [0420 11:03:30] etcd failed to start. Exiting... 

11:03:43 info See 'docker run --help'. 

11:03:43 info docker: Error response from daemon: Get https://gcr.io/v1/_ping: dial tcp 64.233.187.82:443: i/o timeout. 

11:03:43 info Unable to find image 'gcr.io/google_containers/etcd-amd64:3.0.4' locally 

11:03:43 info +++ [0420 11:02:26] Launching etcd... 

11:03:43 info +++ [0420 11:02:25] Launching docker bootstrap... 

11:03:43 info +++ [0420 11:02:25] Killing all kubernetes containers... 

11:03:43 info +++ [0420 11:02:25] -------------------------------------------- 

11:03:43 info +++ [0420 11:02:25] USE_CONTAINERIZED is set to: false 

11:03:43 info +++ [0420 11:02:25] USE_CNI is set to: false 

11:03:43 info +++ [0420 11:02:25] IP_ADDRESS is set to: 115.28.226.116 

11:03:43 info +++ [0420 11:02:25] ARCH is set to: amd64 

11:03:43 info +++ [0420 11:02:25] MASTER_IP is set to: localhost 

11:03:43 info +++ [0420 11:02:25] RESTART_POLICY is set to: unless-stopped 

11:03:43 info +++ [0420 11:02:25] FLANNEL_BACKEND is set to: udp 

11:03:43 info +++ [0420 11:02:25] FLANNEL_NETWORK is set to: 172.16.0.0/16 

11:03:43 info +++ [0420 11:02:25] FLANNEL_IPMASQ is set to: true 

11:03:43 info +++ [0420 11:02:25] FLANNEL_VERSION is set to: v0.6.1 

11:03:42 info +++ [0420 11:02:25] ETCD_VERSION is set to: 3.0.4 

11:03:42 info +++ [0420 11:02:25] K8S_VERSION is set to: v1.5.2 

11:03:42 info +++ [0420 11:02:23] Done. 

11:03:42 info +++ [0420 11:02:23] Killing all kubernetes containers... 

11:03:42 info 0 upgraded, 0 newly installed, 0 to remove and 154 not upgraded. 

11:03:42 info curl is already the newest version. 

11:03:42 info 
Reading state information... 0% Reading state information... 0% Reading state information... Done
 

11:03:42 info 
Building dependency tree... 0% Building dependency tree... 0% Building dependency tree... 50% Building dependency tree... 50% Building dependency tree... 78% Building dependency tree  

11:03:42 info 
Reading package lists... 0% Reading package lists... 100% Reading package lists... Done
 

11:03:42 info COMMANMD=master 

11:03:42 info Linux distribution: ubuntu 

11:03:42 info multi node installation 

11:03:42 info chown -R dlsdk-user:dlsdk-group /root//dlsdk/ 

11:03:42 info mkdir -p /root//dlsdk//dlsdk/security 

11:03:42 info mkdir -p /root//dlsdk/ 

11:03:42 info volume=/root/ 

11:03:42 info selfsigncertificate = undefined 

11:03:42 info etcd_port=4001 

11:03:42 info tf_rest_port=8010 

11:03:42 info caffe_jupyter_port=8001 

11:03:42 info caffe_rest_port=8000 

11:03:42 info js_jupyter_tf_port=31002 

11:03:42 info js_jupyter_caffe_port=31001 

11:03:42 info js_ui_port=31000 

11:03:42 info username = root 

11:03:42 info type = multi 

11:03:42 info stage = 5 

11:03:42 info Version: 14.04 

11:03:41 info OS: ubuntu 

11:03:41 info DLSDK_ETCD_CONTAINER: intelcorp/dl-training-tool:etcd 

11:03:41 info DLSDK_JS_CONTAINER: intelcorp/dl-training-tool:js-release3-latest 

11:03:41 info DLSDK_TF_CONTAINER: intelcorp/dl-training-tool:tf-release3-latest 

11:03:41 info DLSDK_CAFFE_CONTAINER: intelcorp/dl-training-tool:caffe-release3-mlsl-latest 

11:03:41 info Installer version: 1.0.1062  

11:03:41 info output:  

11:02:29 info run: <password>tools\win\plink.exe 115.28.226.116 -l root -pw <password> -batch -t -ssh chmod +x dlsdk_install_scripts/*.sh && echo '<password>' | sudo -S -E -k dlsdk_install_scripts/install_training_tool.sh -stage 5 -type multi -username 'root' -toolpassword '<password>' -startport 31000 -selfsigncertificate 'undefined' -volume ~/  

11:02:29 info ---------------------------------------- Run docker image and install other dependencies 

11:02:29 info  

11:02:29 info  

11:02:29 info ports validation completed 

11:02:29 info 8080 8081 8082 4001 2380 2379 8285 10252 10255 4194 10250 6443 10251 

11:02:29 info validating ports 

11:02:29 info selfsigncertificate = undefined 

11:02:29 info etcd_port=4001 

11:02:29 info tf_rest_port=8010 

11:02:29 info caffe_jupyter_port=8001 

11:02:29 info caffe_rest_port=8000 

11:02:29 info js_jupyter_tf_port=31002 

11:02:29 info js_jupyter_caffe_port=31001 

11:02:29 info js_ui_port=31000 

11:02:29 info username = root 

11:02:29 info type = multi 

11:02:29 info stage = 4 

11:02:29 info Version: 14.04 

11:02:29 info OS: ubuntu 

11:02:29 info DLSDK_ETCD_CONTAINER: intelcorp/dl-training-tool:etcd 

11:02:29 info DLSDK_JS_CONTAINER: intelcorp/dl-training-tool:js-release3-latest 

11:02:28 info DLSDK_TF_CONTAINER: intelcorp/dl-training-tool:tf-release3-latest 

11:02:28 info DLSDK_CAFFE_CONTAINER: intelcorp/dl-training-tool:caffe-release3-mlsl-latest 

11:02:28 info Installer version: 1.0.1062  

11:02:28 info output:  

11:02:26 info run: <password>tools\win\plink.exe 115.28.226.116 -l root -pw <password> -batch -t -ssh chmod +x dlsdk_install_scripts/*.sh && echo '<password>' | sudo -S -E -k dlsdk_install_scripts/install_training_tool.sh -stage 4 -type multi -username 'root' -startport 31000 -selfsigncertificate 'undefined'  

11:02:26 info ---------------------------------------- Validate if ports required for the Training Tool are open 

11:02:26 info stop containers completed 

11:02:26 info  

11:02:26 info stop containers completed 

11:02:26 info Removing any system startup links for /etc/init.d/dlsdk ... 

11:02:26 info +++ [0420 11:02:14] Done. 

11:02:26 info +++ [0420 11:02:14] Killing all kubernetes containers... 

11:02:26 info +++ [0420 11:02:14] Killing docker bootstrap... 

11:02:26 info COMMANMD=remove 

11:02:26 info Linux distribution: ubuntu 

11:02:26 info selfsigncertificate = undefined 

11:02:26 info etcd_port=4001 

11:02:26 info tf_rest_port=8010 

11:02:26 info caffe_jupyter_port=8001 

11:02:26 info caffe_rest_port=8000 

11:02:26 info js_jupyter_tf_port=31002 

11:02:26 info js_jupyter_caffe_port=31001 

11:02:26 info js_ui_port=31000 

11:02:26 info username = root 

11:02:26 info type = multi 

11:02:26 info stage = 3 

11:02:25 info Version: 14.04 

11:02:25 info OS: ubuntu 

11:02:25 info DLSDK_ETCD_CONTAINER: intelcorp/dl-training-tool:etcd 

11:02:25 info DLSDK_JS_CONTAINER: intelcorp/dl-training-tool:js-release3-latest 

11:02:25 info DLSDK_TF_CONTAINER: intelcorp/dl-training-tool:tf-release3-latest 

11:02:25 info DLSDK_CAFFE_CONTAINER: intelcorp/dl-training-tool:caffe-release3-mlsl-latest 

11:02:25 info Installer version: 1.0.1062  

11:02:25 info output:  

11:02:22 info run: <password>tools\win\plink.exe 115.28.226.116 -l root -pw <password> -batch -t -ssh chmod +x dlsdk_install_scripts/*.sh && echo '<password>' | sudo -S -E -k dlsdk_install_scripts/install_training_tool.sh -stage 3 -type multi -username 'root' -startport 31000 -selfsigncertificate 'undefined'  

11:02:22 info ---------------------------------------- Stop docker containers from previous version of Training Tool 

11:02:22 info docker pull completed 

11:02:22 info  

11:02:22 info docker pull completed 

11:02:22 info Status: Image is up to date for intelcorp/dl-training-tool:etcd 

11:02:22 info Digest: sha256:6cb676a494614765b9115983e4f8b9668d06bf03ec840aeb3287dfb17e4fdd09 

11:02:22 info etcd: Pulling from intelcorp/dl-training-tool 

11:02:22 info Status: Image is up to date for intelcorp/dl-training-tool:js-release3-latest 

11:02:22 info Digest: sha256:9e4dcf1a2e73479b0097bbcfd3ccbcf93ec175c7b235c9a3a41dab7773c963d9 

11:02:22 info js-release3-latest: Pulling from intelcorp/dl-training-tool 

11:02:22 info Status: Image is up to date for intelcorp/dl-training-tool:tf-release3-latest 

11:02:22 info Digest: sha256:5f0a16fa1f22bbedbcf3501cd1d8a2e2b4724126147eba3d016f8e2dddfa0b55 

11:02:22 info tf-release3-latest: Pulling from intelcorp/dl-training-tool 

11:02:22 info Status: Image is up to date for intelcorp/dl-training-tool:caffe-release3-mlsl-latest 

11:02:22 info Digest: sha256:0cb3ae8440159a6103b85ac1ef6ed117e90d679855a694e7ffcad5870d26e421 

11:02:22 info caffe-release3-mlsl-latest: Pulling from intelcorp/dl-training-tool 

11:02:22 info selfsigncertificate = undefined 

11:02:22 info etcd_port=4001 

11:02:22 info tf_rest_port=8010 

11:02:22 info caffe_jupyter_port=8001 

11:02:22 info caffe_rest_port=8000 

11:02:22 info js_jupyter_tf_port=31002 

11:02:22 info js_jupyter_caffe_port=31001 

11:02:22 info js_ui_port=31000 

11:02:22 info username = root 

11:02:22 info type = multi 

11:02:21 info stage = 2 

11:02:21 info Version: 14.04 

11:02:21 info OS: ubuntu 

11:02:21 info DLSDK_ETCD_CONTAINER: intelcorp/dl-training-tool:etcd 

11:02:21 info DLSDK_JS_CONTAINER: intelcorp/dl-training-tool:js-release3-latest 

11:02:21 info DLSDK_TF_CONTAINER: intelcorp/dl-training-tool:tf-release3-latest 

11:02:21 info DLSDK_CAFFE_CONTAINER: intelcorp/dl-training-tool:caffe-release3-mlsl-latest 

11:02:21 info Installer version: 1.0.1062  

11:02:21 info output:  

11:01:46 info run: <password>tools\win\plink.exe 115.28.226.116 -l root -pw <password> -batch -t -ssh chmod +x dlsdk_install_scripts/*.sh && echo '<password>' | sudo -S -E -k dlsdk_install_scripts/install_training_tool.sh -stage 2 -type multi -username 'root' -startport 31000 -selfsigncertificate 'undefined'  

11:01:46 info ---------------------------------------- pull docker images for Training Tool 

11:01:46 info docker installation finished 

11:01:46 info  

11:01:46 info docker installation finished 

11:01:46 info isDockerProxyNeeded = false 

11:01:46 info docker client version is 17.04.0-ce 

11:01:46 info docker server version is 17.04.0-ce 

11:01:46 info docker is already installed 

11:01:46 info selfsigncertificate = undefined 

11:01:46 info etcd_port=4001 

11:01:46 info tf_rest_port=8010 

11:01:46 info caffe_jupyter_port=8001 

11:01:46 info caffe_rest_port=8000 

11:01:46 info js_jupyter_tf_port=31002 

11:01:46 info js_jupyter_caffe_port=31001 

11:01:46 info js_ui_port=31000 

11:01:46 info username = root 

11:01:46 info type = multi 

11:01:46 info stage = 1 

11:01:46 info Version: 14.04 

11:01:46 info OS: ubuntu 

11:01:46 info DLSDK_ETCD_CONTAINER: intelcorp/dl-training-tool:etcd 

11:01:46 info DLSDK_JS_CONTAINER: intelcorp/dl-training-tool:js-release3-latest 

11:01:46 info DLSDK_TF_CONTAINER: intelcorp/dl-training-tool:tf-release3-latest 

11:01:46 info DLSDK_CAFFE_CONTAINER: intelcorp/dl-training-tool:caffe-release3-mlsl-latest 

11:01:46 info Installer version: 1.0.1062  

11:01:46 info output:  

11:01:44 info run: <password>tools\win\plink.exe 115.28.226.116 -l root -pw <password> -batch -t -ssh chmod +x dlsdk_install_scripts/*.sh && echo '<password>' | sudo -S -E -k dlsdk_install_scripts/install_training_tool.sh -stage 1 -type multi -username 'root' -startport 31000 -selfsigncertificate 'undefined'  

11:01:44 info ---------------------------------------- install docker to Linux server if needed 

11:01:44 info User didnt provide PFX file. 

11:01:44 info  

11:01:44 info output:  

11:01:30 info run: tools\win\pscp.exe -scp -batch -P 22 -pw <password> -r dlsdk_install_scripts root@115.28.226.116: 

11:01:30 info ---------------------------------------- upload install shell scripts to the server 

11:01:30 info plink output:  

11:01:28 info echo y | tools\win\plink.exe -l root 115.28.226.116 -pw passwd -ssh "exit" 

11:01:28 info ---------------------------------------- cache ssh-rsa server fingerprint key

 

Does anyone know how can I solve this? Thanks a lot in advance.

Regrads

 

Error: 503 Docker containers cannot be started

same error for me when i install the new version of the SDK and my older version is not working now

 

please anyone help to solve the above error 

 

Error: 503 Docker containers cannot be started

 

tis shows on installing the SDK

hi,

the training shows it completed and shows information and testing.
but it doesn't show the model analysis and dataset

the screenshot attached below

thanks

 

Hi,

This is indeed weird - I wasn't able to reproduce it on my machine.
Please try the following:

1. Clean your cache memory (ctrl+shift+delete) -> clear cache
2. Try to open a different model and then return to the model you ran
3. Try to refresh the page and re-enter the tool

Please let us know if this still happens.
If there is still a problem - please attach a screen shot of the model's "details" tab.

Barak.

Hello. I got an error while installing the training tool. I personally blame this on the fact that I didn't install it in one go, like I would download a bit more today, then stop the process and continue the next day, but not sure on how to fix it. Here's the error code

 

altair_daisy@dextro:~/Downloads/dl-sdk-training-tool-1.0.1131$ sudo bash install_training_tool.sh
Installer version: 1.0.1131
DLSDK_CAFFE_CONTAINER: intelcorp/dl-training-tool:caffe-release4-mlsl-latest
DLSDK_TF_CONTAINER: intelcorp/dl-training-tool:tf-release4-latest
DLSDK_JS_CONTAINER: intelcorp/dl-training-tool:js-release4-latest
DLSDK_ETCD_CONTAINER: intelcorp/dl-training-tool:etcd
OS: ubuntu
Version: 16.04

License agreement
-----------------------------------------------------------------------
--------- 
To continue with the installation of this product you are required to a
ccept 
the terms and conditions of the End User License Agreement (EULA), see
Intel_Deep_Learing_SDK_EULA.txt file in the package. After reading the 
EULA,
you must enter 'accept' to continue the installation or 'decline' to ex
it from 
the install script.
-----------------------------------------------------------------------
---------

 

Type "accept" to continue or "decline" to exit from the install script: accept

Please enter password to access the Training Tool web interface: [admin] ------------
docker is already installed
docker server version is 17.06.0-ce
docker client version is 17.06.0-ce
isDockerProxyNeeded = false
setting proxy for docker (systemd)
docker installation finished
docker pull intelcorp/dl-training-tool:caffe-release4-mlsl-latest
caffe-release4-mlsl-latest: Pulling from intelcorp/dl-training-tool
Digest: sha256:89d45e1575e04cb741431ba22d3761952482c7ffbfb9dcd4f014de19c8419f9e
Status: Image is up to date for intelcorp/dl-training-tool:caffe-release4-mlsl-latest
docker pull intelcorp/dl-training-tool:tf-release4-latest
tf-release4-latest: Pulling from intelcorp/dl-training-tool
Digest: sha256:43ae4bb9bdee2889916d5ff6182b7ee20d710eb0d7b5dc43afc5f621c34256d5
Status: Image is up to date for intelcorp/dl-training-tool:tf-release4-latest
docker pull intelcorp/dl-training-tool:js-release4-latest
js-release4-latest: Pulling from intelcorp/dl-training-tool
Digest: sha256:62e3d43bad15f1ba0fd648ae8697e1abdc28d0bfb72a1890556ae5763315b5da
Status: Image is up to date for intelcorp/dl-training-tool:js-release4-latest
docker pull intelcorp/dl-training-tool:etcd
etcd: Pulling from intelcorp/dl-training-tool
Digest: sha256:6cb676a494614765b9115983e4f8b9668d06bf03ec840aeb3287dfb17e4fdd09
Status: Image is up to date for intelcorp/dl-training-tool:etcd
docker pull completed
Linux distribution: ubuntu
COMMANMD=remove
+++ [0720 13:37:15] Killing all kubernetes containers...
+++ [0720 13:37:15] Done.
stop containers completed
validating ports
8080 8081 8082 4001 2380 2379 8285 10252 10255 4194 10250 6443 10251
ports validation completed
mkdir -p /home/altair_daisy/dlsdk/
chown -R dlsdk-user:dlsdk-group /home/altair_daisy/dlsdk/
multi node installation
Linux distribution: ubuntu
COMMANMD=master
Reading package lists... Done
Building dependency tree       
Reading state information... Done
curl is already the newest version (7.47.0-1ubuntu2.2).
The following packages were automatically installed and are no longer required:
  linux-headers-4.8.0-36 linux-headers-4.8.0-36-generic
  linux-image-4.8.0-36-generic linux-image-extra-4.8.0-36-generic
  linux-signed-image-4.8.0-36-generic snap-confine
Use 'sudo apt autoremove' to remove them.
0 to upgrade, 0 to newly install, 0 to remove and 1 not to upgrade.
+++ [0720 13:37:19] Killing all kubernetes containers...
+++ [0720 13:37:19] Done.
+++ [0720 13:37:21] K8S_VERSION is set to: v1.5.2
+++ [0720 13:37:21] ETCD_VERSION is set to: 3.0.4
+++ [0720 13:37:21] FLANNEL_VERSION is set to: v0.6.1
+++ [0720 13:37:21] FLANNEL_IPMASQ is set to: true
+++ [0720 13:37:21] FLANNEL_NETWORK is set to: 172.16.0.0/16
+++ [0720 13:37:21] FLANNEL_BACKEND is set to: udp
+++ [0720 13:37:21] RESTART_POLICY is set to: unless-stopped
+++ [0720 13:37:21] MASTER_IP is set to: localhost
+++ [0720 13:37:21] ARCH is set to: amd64
+++ [0720 13:37:22] IP_ADDRESS is set to: 196.249.35.58
+++ [0720 13:37:22] USE_CNI is set to: false
+++ [0720 13:37:22] USE_CONTAINERIZED is set to: false
+++ [0720 13:37:22] --------------------------------------------
+++ [0720 13:37:22] Killing all kubernetes containers...
+++ [0720 13:37:22] Launching docker bootstrap...
!!! [0720 13:37:42] docker bootstrap failed to start. Exiting...
Error on or near line 84; exiting with status 1

Leave a Comment

Please sign in to add a comment. Not a member? Join today