These are some of the problems that I encountered while installing and training a DeepMask model.
1. module coco not found
Install the lua coco package by following the steps mentioned below:
Clone coco repository:
git clone https://github.com/pdollar/coco
Under coco/ run the following command:
luarocks make LuaAPI/rocks/coco-scm-1.rockspec
2. T_END at character 1 when running th train.lua
You need to reinstall torch 7 with lua 5.2 version. There is a step-by-step guide which explains how to install torch with lua 5.2. on this link.
This solution will work fine if you are training DeepMask using resnet model otherwise it could result in another error given below:
3. Failed to load function from bytecode: binary string: not a precompiled chunkWarning: Failed to load function from bytecode
You can resolve this problem by running the train.lua command with luajit instead of lua in following way:
luajit train.lua -dm /path/to/pretrained/deepmask
4. cannot open </pretrained/deepmask/model.t7> in mode r
This error was occurring while I was using the laujit version 2.0.5. You need to install luajit version 2.1.0 to resolve this issue.
You can install this by following the steps given below:
wget http://luajit.org/download/LuaJIT-2.1.0-beta3.tar.gz tar zxf LuaJIT-2.1.0-beta3.tar.gz make sudo make install
Add the path for luajit in your .bashrc. Usually Luajit is installed in this path:
5. DataSampler.lua:230: bad argument #3 to 'narrow'
Although I was able to convert my JSON files to t7 files correctly, while I was doing training I was getting this weird error after some initial epochs. After a lot of brainstorming I was able to resolve this problem by simply rounding off numbers in my annotation file.
6. DataSampler.lua:230: bad argument #2 to 'narrow'
This error was occuring because of presence of negative values in my segmentation key for some images. After resolving this error, I was able to run training correctly.
7. Bad argument # 2 to '?'
This was not a problem in my JSON file. I had only one category initially in my dataset, while in DeepMask, the number of categories is hardcoded to be 80. So I had to change this line of code to keep running the training correctly:
local cat,ann = torch.random(80)
Changing 80 to 1 resolved the issue.
8. Getting 'nan' as loss after few iterations, when training with custom dataset
It was because DeepMask uses default learning rate schedule. I solved this problem by changing my learning rate to 1e-06.