Intel Caffe OpenMP Threads on Xeon Phi

Intel Caffe OpenMP Threads on Xeon Phi

Hi All,

I am setting following OpenMP thread parameters using bash before I train CIFAR-10. However, Intel Caffe overwrites these and goes to default 64 threads. Also, instead of compact, threads are scattered.

Can anyone please share where I am going wrong?

Environment variables set before running training of CIFAR-10:

export KMP_HW_SUBSET=64c,4t
export KMP_AFFINITY=verbose,granularity=fine,compact

CIFAR-10 Prototxt has engine as MKL:

cat examples/cifar10/cifar10_full_sigmoid_solver_bn.prototxt

# reduce learning rate after 120 epochs (60000 iters) by factor 0f 10
# then another factor of 10 after 10 more epochs (5000 iters)
engine:"MKL2017"
# The train/test net protocol buffer definition
net: "examples/cifar10/cifar10_full_sigmoid_train_test_bn.prototxt"
# test_iter specifies how many forward passes the test should carry out.
# In the case of CIFAR10, we have test batch size 100 and 100 test iterations,
# covering the full 10,000 testing images.
test_iter: 10
# Carry out testing every 1000 training iterations.
test_interval: 1000
# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.001
momentum: 0.9
#weight_decay: 0.004
# The learning rate policy
lr_policy: "step"
gamma: 1
stepsize: 5000
# Display every 200 iterations
display: 100
# The maximum number of iterations
max_iter: 60000
# snapshot intermediate results
snapshot: 10000
snapshot_prefix: "examples/cifar10_full_sigmoid_bn"
# solver mode: CPU or GPU
solver_mode: CPU

Command executed:

./examples/cifar10/train_full_sigmoid_bn.sh

Output log:

I0927 11:14:16.258430 14454 cpu_info.cpp:468] OpenMP environmental variables are specified: no
I0927 11:14:16.258483 14454 cpu_info.cpp:471] OpenMP thread bind allowed: yes
I0927 11:14:16.258538 14454 cpu_info.cpp:474] Number of OpenMP threads: 64

Platform:

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                256
On-line CPU(s) list:   0-255
Thread(s) per core:    4
Core(s) per socket:    64
Socket(s):             1
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 87
Model name:            Intel(R) Xeon Phi(TM) CPU 7210 @ 1.30GHz
Stepping:              1
CPU MHz:               1098.957
BogoMIPS:              2600.02
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              1024K
NUMA node0 CPU(s):     0-255
NUMA node1 CPU(s):

Thanks.

 

15 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Dear Chetan

Please use OMP_NUM_THREADS = <number of physical cores - 2> and KMP_AFFINITY=verbose,granularity=fine,compact

 

Thanks

Anand

Hi Anand,

I still get this following log which says OpenMP variables haven't been specified. I do export environment variables before running training session.

I0928 10:31:03.746587 33142 cpu_info.cpp:465] GPU is used: no
I0928 10:31:03.746639 33142 cpu_info.cpp:468] OpenMP environmental variables are specified: no
I0928 10:31:03.746726 33142 cpu_info.cpp:471] OpenMP thread bind allowed: yes
I0928 10:31:03.746785 33142 cpu_info.cpp:474] Number of OpenMP threads: 64

Is there restriction on number of threads? Can't I have more than 62 threads (I have 64 physical cores)?

Thanks.

Dear Chetan,

The OpenMP Environment variables has to be set in the following way

export OMP_NUM_THREADS=<number_of_cores which implies 64 or 68  depending on Intel Xeon Phi x200 number of cores>
export KMP_AFFINITY=granularity=fine,compact,1,0

 

It is ideally better to leave 2 cores for any other misc process. But the max you could specify is number of cores. You could see performance difference during caffe time with various OMP_NUM_THREADS

 

Thanks

Anand

 

Hi Anand,

These environment variables are not having any effect with Intel Caffe. I run with 32 threads, but Intel Caffe is still running 64 threads. Why?

Also, why can't I run more than 64 threads (Xeon Phi 7210)? System allows 256 threads. Is this a restriction for Caffe? 

OpenMP environment works fine when I use it for application that use MKL + ICC compiler. Some bug with Intel Caffe may be?

Thanks.

Hi Anand,

Any suggestions on this? I am still not able to take advantage of OpenMP threads even after settting environment variables. Same environment variables do work for other MKL benchmarks like Intel MKL LINKPACK.

Please suggest solutions.

Thanks.

Dear Chetan,

The number of threads that need to be set is <NO OF PHYSICAL CORES> and it seems like Intel Caffe default to the number of Physical cores

irrespective of the value you set to OMP_NUM_THREADS. Are you facing any concerns because of this behavior? Kindly let me know if this needs to be taken with a product SME

Thanks

Anand

Hi Anand,

Yes. They can reach out to me directly also, if required.

For Intel's team benefit:

First: I have 64 physical cores with 4 threads per core (Xeon Phi 7210). So, in all 256 threads.
Second: Intel Optimized Caffe is supposed to make use of OpenMP threads, which is Xeon Phi's key feature too.

By default, I know Intel Optimized Caffe will run 64 threads, which it does. But shouldn't these number of threads change based on the OpenMP environment variables (KMP_HW_SUBSET and KMP_AFFINITY)? For me irrespective of what these two environment variables I set, the number of threads and affinity type is same i.e 64 threads and scatter affinity. I want to explore other ways of mapping threads too. Is that possible with Intel Caffe?

I think it should be possible, because at the start of training/testing with Intel Caffe, I see following log. To me or any other user it means Intel Caffe couldn't read the set environment variables, why?

I0928 10:31:03.746587 33142 cpu_info.cpp:465] GPU is used: no
I0928 10:31:03.746639 33142 cpu_info.cpp:468] OpenMP environmental variables are specified: no
I0928 10:31:03.746726 33142 cpu_info.cpp:471] OpenMP thread bind allowed: yes
I0928 10:31:03.746785 33142 cpu_info.cpp:474] Number of OpenMP threads: 64

Thanks.

Dear Chetan,

It looks like you have not specified OMP_NUM_THREADS, before running your caffe time/train. Please export OMP_NUM_THREADS and run caffe time. It will definitely work. I verified it and it seems to be picking the OMP variables.

Thanks

Anand

Hi Anand,

It doesn't work.

Step 1: export OMP_NUM_THREADS=2T
Step 2: export KMP_AFFINITY=verbose,compact,granularity=fine
Step 3: ./build/tools/caffe time --model=models/bvlc_alexnet/train_val.prototxt --engine=MKLDNN

I see 64 threads with 1 thread to each physical core running in scatter mode.

Do you see following highlighted line in your log as "yes"? Can you please share step by step how you are exporting and which model your testing OpenMP for?

I1003 08:44:55.058910 38661 cpu_info.cpp:453] Processor speed [MHz]: 1300
I1003 08:44:55.059046 38661 cpu_info.cpp:456] Total number of sockets: 1
I1003 08:44:55.059109 38661 cpu_info.cpp:459] Total number of CPU cores: 64
I1003 08:44:55.059164 38661 cpu_info.cpp:462] Total number of processors: 256
I1003 08:44:55.059219 38661 cpu_info.cpp:465] GPU is used: no
I1003 08:44:55.059273 38661 cpu_info.cpp:468] OpenMP environmental variables are specified: no  
I1003 08:44:55.059334 38661 cpu_info.cpp:471] OpenMP thread bind allowed: yes
I1003 08:44:55.059567 38661 cpu_info.cpp:474] Number of OpenMP threads: 64
I1003 08:44:55.059926 38661 net.cpp:806] The NetState phase (0) differed from the phase (1) specified by a rule in layer data
I1003 08:44:55.060099 38661 net.cpp:806] The NetState phase (0) differed from the phase (1) specified by a rule in layer accuracy

Thanks.

Hi Anand,

It worked after I put the environment variables and Intel Caffe training command in a bash script and executed it. I was using export on bash command line and then executing Intel Caffe and this process was not able to pick up the environment variables.

Keeping all the commands in a single bash file helped. You may close this thread.

Thanks.

Hi Anand,

I am facing new issue on this. All the system configuration are same and now Intel Caffe log shows that it's not able to bind OpenMP threads, and I tried many things to solve this but nothing is working out. 

A day before, everything was fine and now suddenly Caffe log shows this. Can you suggest how to solve this?

I1028 15:48:39.768147 11504 cpu_info.cpp:453] Processor speed [MHz]: 1300
I1028 15:48:39.768308 11504 cpu_info.cpp:456] Total number of sockets: 1
I1028 15:48:39.768383 11504 cpu_info.cpp:459] Total number of CPU cores: 64
I1028 15:48:39.768942 11504 cpu_info.cpp:462] Total number of processors: 256
I1028 15:48:39.769050 11504 cpu_info.cpp:465] GPU is used: no
I1028 15:48:39.769290 11504 cpu_info.cpp:468] OpenMP environmental variables are specified: yes
I1028 15:48:39.769366 11504 cpu_info.cpp:471] OpenMP thread bind allowed: no
I1028 15:48:39.816463 11504 cpu_info.cpp:474] Number of OpenMP threads: 32

OpenMP thread bind allowed should be "yes", not sure ever after thread being created, it's not able to bind. Because of this all threads by default go to core 0, thus not making use of OpenMP environment variables.

I did test the OpenMP for other benchmark not using Caffe and the threads spawned are being mapped correctly.

Thanks.

Dear Chetan,

This looks strange, provided that you are able to run accurately on the previous day. Can you kindly make sure that, all unwanted processes are killed and also the OMP variables which are set is being reflected  correctly. I hope there is no change in Makefile.config from the previous version.

 

Thanks

Anand

 

Hi Anand,

I don't have specific answer. But it started working now. May be I need to wait for few seconds or minutes before benchmark sets up the affinity and I observe the output.

You may close this thread.

Thanks.

That's very interesting, I will close this issue

 

Thanks

Anand

Leave a Comment

Please sign in to add a comment. Not a member? Join today