Hi All,
I am setting following OpenMP thread parameters using bash before I train CIFAR-10. However, Intel Caffe overwrites these and goes to default 64 threads. Also, instead of compact, threads are scattered.
Can anyone please share where I am going wrong?
Environment variables set before running training of CIFAR-10:
export KMP_HW_SUBSET=64c,4t
export KMP_AFFINITY=verbose,granularity=fine,compact
CIFAR-10 Prototxt has engine as MKL:
cat examples/cifar10/cifar10_full_sigmoid_solver_bn.prototxt
# reduce learning rate after 120 epochs (60000 iters) by factor 0f 10
# then another factor of 10 after 10 more epochs (5000 iters)
engine:"MKL2017"
# The train/test net protocol buffer definition
net: "examples/cifar10/cifar10_full_sigmoid_train_test_bn.prototxt"
# test_iter specifies how many forward passes the test should carry out.
# In the case of CIFAR10, we have test batch size 100 and 100 test iterations,
# covering the full 10,000 testing images.
test_iter: 10
# Carry out testing every 1000 training iterations.
test_interval: 1000
# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.001
momentum: 0.9
#weight_decay: 0.004
# The learning rate policy
lr_policy: "step"
gamma: 1
stepsize: 5000
# Display every 200 iterations
display: 100
# The maximum number of iterations
max_iter: 60000
# snapshot intermediate results
snapshot: 10000
snapshot_prefix: "examples/cifar10_full_sigmoid_bn"
# solver mode: CPU or GPU
solver_mode: CPU
Command executed:
./examples/cifar10/train_full_sigmoid_bn.sh
Output log:
I0927 11:14:16.258430 14454 cpu_info.cpp:468] OpenMP environmental variables are specified: no
I0927 11:14:16.258483 14454 cpu_info.cpp:471] OpenMP thread bind allowed: yes
I0927 11:14:16.258538 14454 cpu_info.cpp:474] Number of OpenMP threads: 64
Platform:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 256
On-line CPU(s) list: 0-255
Thread(s) per core: 4
Core(s) per socket: 64
Socket(s): 1
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 87
Model name: Intel(R) Xeon Phi(TM) CPU 7210 @ 1.30GHz
Stepping: 1
CPU MHz: 1098.957
BogoMIPS: 2600.02
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
NUMA node0 CPU(s): 0-255
NUMA node1 CPU(s):
Thanks.