Building Python

Building Python

Hi everyone,

I just wanted to ask if anyone's got any expercience building Python with icc / icpc?
I'm gonna try and post if I run into trouble of any kind or what I had to do to get it to work.
I'll try Python 3.1.1 with computed gotos ;) Btw, I'm running Ubuntu 9.10 amd64 on a Core2Duo T7200 notebook.

Cheers,
nuku

36 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi

The better way is to made test and you call after if an problem
occurred.

Kind regards

I was intending to do that anyway ;)
I used the official CPython 3.1.1 tarball without any customizations.
My machine: Intel Core2Duo T7200, 2GB Ram, in a Dell Inspiron 9400 running Ubuntu 9.10 amd64, Kernel 2.6.30-20
So here's what I ran into.
The command lines I used were:
./configure --with-computed-gotos --without-gcc CC=icc CXX=icpc CCFLAGS="-O3"
make CC=icc CXX=icpc CCFLAGS="-O3"
make test CC=icc CXX=icpc CCFLAGS="-O3"
failed (basically, none if the modules built). Same if I leave out all parameters except for CC=icc. So I scanned through the roughly 13,000 lines of output and two things occured to me:
1.) It can't find libimf.so, so I need to pass LD_LIBRARY_PATH manually. (for me, it was /opt/intel/Compiler/11.1/069/lib/intel64/)
2.) _ctypes module uses libffi which uses __int128_t which is not included in 11.1.069.
I decided to fix libimf first. So i did a make again.
Result: everything built except for _ctypes, and it missed some necessary bits to build _dbm, _gdbm, _sqlite3, _tkinter, and readline. I decided not to care about the latter, so if you need them you'll need to find out yourself ;)
Then, I patched ./Modules/_ctypes/libffi/src/x86/ffi64.c exactly as described in step 7 here: http://software.intel.com/en-us/articles/build-firefox-35-with-intel-c-c...
So I did a make again, and all it did was build libffi and ctypes, so this one went really quick.
Then, I did a make test to see if it was all right.

There are some test failing, but it looks like 95% of fails are for NaN being treated incorrectly... If you know how to fix this, please drop me a line here. Thanks.

So, basically what you need to do is:
- of course, pass CC=icc and whatever else you want to have
- pass LD_LIBRARY_PATH=/your/path/to/libimf.so/ when configuring (I don't know wheter it's necessary to pass it to make, so I did and it didn't do any harm...)
- patch vour lbffi as described above

The compiler library /lib/intel64/ will be on LD_LIBRARY_PATH if you set up correctly, e.g. by 'source /opt/intel/Compiler/11.1/069/bin/iccvars.sh' (before running configure).
It's unfortunate that icc doesn't accept the point of view of gcc on __int128_t. If you don't want to hack around this, you could perform that part of the build with gcc.
If you want icc to treat NaN "correctly," you must start by setting appropriate options (-fp-model precise, or possibly -fp-model source).

Ok, I did some benchmarking with pybench.
Compiler flags used: -O3 -msse3
Comparing gcc and icc with exactly the same flags, GCC is about 5 percent faster! Seehttp://bpaste.net/show/4371/

Comparing it to the standard Ubuntu 2.6.4 python, it's about 10% performance increase, but well, that isn't a fair comparison...

Hi
About (5%) difference
Use flag (-fast) with Icc if the compile can result ok with..
best regards

Hey Nuku, I've got a couple questions. First, can I assume that once you sourced the iccvars file (as Tim described) that you were able to build without problems? Also, what kind of machine were you running on, i.e. Linux, Windows, what type of CPU? Were you comparing two python's that you built (i.e. same tarball, one built with gcc and one with icc) or was the gcc one one of the published executables?

I'm interested in trying it myself and seeing if I can reproduce the performance you're seeing.

Thanks!

Dale

Hey Dale,
My machine: Intel Core2Duo T7200, 2GB Ram, in a Dell Inspiron 9400 running Ubuntu 9.10 amd64, Kernel 2.6.30-20
I used the official CPython 3.1.1 tarball without any customizations.
Somehow, the iccvars file failed, I tried it multiple times, but this may be a result of me running an unsupported OS... I just added symlinks to all the libs in that dir to /ust/lib and the bin dir to my path, and everything worked just finde then.
And I compiled it myself with GCC, from the same tarball, compiler flags see above.

I'll try bustaf's idea in a minute.

It would be cool if we could share some flags or whatever it takes to make it faster so we can build the ultimately fast python :D

Nuku

It might help if you would tell us your ideas on how you want to speed it up. Without a plan, it seems unlikely that the superior auto-vectorization of icc would contribute, or that we could expect you to apply OpenMP to take advantage of the Intel library.

I read through the icc help and chose quite a lot of optimization options.
IPO seems to fail: http://bpaste.net/show/4402/
So the -fast option doesn't work.
The flags I chose are:

-O3 -mssse3 -no-prec-div -static -xSSSE3 -vec -fomit-frame-pointer -unroll-agressive -opt-multi-version-aggressive -vec-guard-write -opt-malloc-options=1 -opt-calloc -mkl -openmp -fp-model=precise -fp-speculation=safe -inline-level=2 -w

Please don't beat me if there's something idiotic in there, I'm a real newbie concerning optimization ;)
Thoughts behind those flags: I need precise floating-point arithmetics and maximum speed, executable size doesn't matter.

I noticed I hate to do a make clean when changing flags, running ./configure again with updated flags wasn't enough.
Leaving out ipo gives me another error:http://bpaste.net/show/4403/

I really don't know what causes this and would be glad if you or someone else could help me ;)

Hi nuku
Strange that Icc slower that GNU ,but if build result bad with flags optimization parameters is normal.
I have in my hand an fedora 12 (64) with Intel compiler installed (INTEL 64 PROCESSOR(S) MACHINE),
when i have time i made an test for verify.. but i have doubt that i can result better that you.
Also an question about your netbook model:
Your (touch pad) work well or catastrophic (too sensitive) same some other netbook ??
Wireless modules is Ralink type ? or please
can you show me result (lspci > file) ?
I think call this model.
Best regards

Hi bustaf,

Yeah I know that optimizations often break code, but it would take years to find out which flag or combination of flags breaks it in this case ;) So basically what I ment is if someone around here has come across a similar situation, what their approach and/or solution to it was ;)

well the notebook I'm using for compiling is over three years old by now and of course isn't sold any more.
It's a Dell Inspiron 9400, Core2Duo T7200, 2GB RAM, 200GB HD, GeForce Go 7900GS, Intel 3945ABG Wireless.
Everything's ok with that machine^^
But as you are referring to netbooks (the Dell is 17", I wouldn't call that a netbook :D), I do also have an Asus UL30A, Core2Duo SU7300, 4GB RAM, with an Atheros wireless card. That one has a weird touchpad which takes some getting used to, but works fine if you're used to it^^ I don't really know what you want, so it's a bit hard^^ Hope I could be of help anyway.

nuku

Hi nuku
Thank for your answers about hardware.
Essential ,GNU or ICC is that you have improved 10% with an new build source,
is already well compared binary distro by default.
Best regards

Well 10% compared to an old Version (2.6.4) is nothing that counts as a success. It may be that Python 3 is just 10% (or more) faster than 2.6 ;)
Cheers

Hi
Accorded ,maybe is possible here..
In majority task my job i am obligated to build new sources verified. for all lib or utility that can sharing with my new personal programming as added
observed in 99% , result better between 5% / 15 % or greater that compared origin binary distro. but is not true with all.
Observe , rare that version that old fashioned way slower than new ones, rather observed the opposite,with all the new functions actual added or necessaries.
Luckily that processors evolve.
Best regards

The IPO error is explained in the first IPO diagnostic.

ipo: warning #11053: libpython3.1.a is an archive, but has no symbols (this can happen if ar is used where xiar is needed)

You need to get the makefiles to use the Intel-provided xiar and xild for the archiver and linker instead of ar and ld. I would assume the configure script has options to override those.

The other error you posted is a symbol that comes from our OpenMP* runtime library libiomp5.so. You mentioned you had LD_LIBRARY_PATH set, so that should be picked up if you're picking up libimf.so, so not sure why there's a problem there.

Finally I would mention that you want to be a bit discriminate if you can about where you use -fp-model=precise. This does eliminate some optimizations so I would only use it where needed. I might play around with different levels of -O as well, sometimes -O1 or -O2 can be better than -O3 depending on the application. I might also try removing the -inline-level option and let the compiler's default inlining heuristic take over unless you have a good reason for adding that. And just fyi that -vec and -fomit-frame-pointer are already enabled by default, so you don't need to specify them.

Brandon Hewitt Technical Consulting Engineer For 1:1 technical support: http://premier.intel.com Software Product Support info: http://www.intel.com/software/support

Hi Nuku and All ...
I have download Python-3.1.2 sources and test speedily build with -fast (ipo).
Resulting ok, but require other changed parameter
I am not initiated Python for i extend correctly require i read part of the source or
query from some my friends initiated.

LANG=C;
export LANG
LD_LIBRARY_PATH=/opt/intel/Compiler/11.1/064/lib/intel64;
export LD_LIBRARY_PATH
CC="/opt/intel/Compiler/11.1/069/bin/intel64/icc"
export CC
CXX="/opt/intel/Compiler/11.1/069/bin/intel64/icpc"
export CXX
LD="/opt/intel/Compiler/11.1/069/bin/intel64/xild"
export LD
AR="/opt/intel/Compiler/11.1/069/bin/intel64/xiar"
export AR
Remark:
(xild not used , (shared-intel) flag not used.)
(LD is not env var Makefile) or require -f (file) instructed ???

1] run ./configure
2] open Makefile made the line 61 same:
OPT= -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -fPIC -shared -fast

make
ipo is ok but I think that you having practice for adding other parameter are absent

Python build finished, but the necessary bits to build these modules were not found:
_dbm _gdbm _sqlite3
_tkinter bz2
To find the necessary bits, look in setup.py in detect_modules() for the module's name.

Failed to build these modules:
_bisect _codecs_cn _codecs_hk
_codecs_iso2022 _codecs_jp _codecs_kr
_codecs_tw _collections _csv
_ctypes _ctypes_test _curses
_curses_panel _elementtree _hashlib
_heapq _json _lsprof
_multibytecodec _multiprocessing _pickle
_random _socket _ssl
_struct _testcapi array
atexit audioop binascii
cmath crypt datetime
fcntl grp itertools
math mmap nis
operator ossaudiodev parser
pyexpat readline resource
select spwd syslog
termios time unicodedata
zlib

running build_scripts
creating build/scripts-3.1
copying and adjusting /usr/src/download/Python-3.1.2/Tools/scripts/pydoc3 -> build/scripts-3.1
copying and adjusting /usr/src/download/Python-3.1.2/Tools/scripts/idle3 -> build/scripts-3.1
copying and adjusting /usr/src/download/Python-3.1.2/Tools/scripts/2to3 -> build/scripts-3.1
changing mode of build/scripts-3.1/pydoc3 from 644 to 755
changing mode of build/scripts-3.1/idle3 from 644 to 755
changing mode of build/scripts-3.1/2to3 from 644 to 755

make install etc ....
machine is an Intel 2 cores
kernel is: 2.6.31.12-174.2.22.fc12.x86_64
Operating system is Fedora 12.

Good luck for finalize correctly
Kind regards

Looks like you're missing some library. I had the same problem until I correctly integrated the icc libraries (this should be in your PATH or something similar, or in your LD_LIBRARY_PATH).

Hi Nuku
You having write:
(So the -fast option doesn't work.)
I have made an test just for help you( as reference) this side. (-fast)
Is Interesting but , I have not actually the job for use Python,...
(Also i have remove ICC to machine of customer...)
Best regards.

I'm sorry, bustaf, but I don't quite get what you are trying to tell me...
So you tried it with the "-fast" option, and the above is what you got?

Hi
You can see that with xiar given by Brandon Hewitt and fPIC required for option -fast the build is ended correctly but with some faults.
If i want build python without fault with Icc , i can made without help.
I have perfectly understand all faults are given and how to make, just no the times for made several step test compile for that i not having to use.
If you not understand my bad control explain expression language , is not important.
Bests regards.

OK, I finally got a chance to spend some time on this. In addition to the need to set AR=xiar as already pointed out in this thread, there's also the problem with __int128_t with a workaround described in http://software.intel.com/en-us/forums/showthread.php?t=56652&o=a&s=lr.

There is one other issue related to -ipo that I found. Essentially, when you use -ipo some functions get inlined everywhere they are called in the 'python' executable, so the compiler is removing the original body. Unfortunately some of the shared libs depend on some of these functions, causing problems with a lot of loadable modules. I don't have a workaround for this yet, so for now you should avoid -ipo, but I'll work with the developers to get thsi fixed.

Thanks!

Dale

I'm sorry, I was quite busy lately (found a small old robot of mine based on an ATmega8^^), but I'll try AR=xiar later today.
I already found the sulution to the problem with __int128_t in a small tutorial on how to build Firefox (which also uses libffi) with icc. Anyway, even if there was no fix, it wouldn't be so tragic, as it only affects the curses module.

As for ipo - so the compiler inlines functions which are referenced by modules - did I understand this correctly?

Thanks a lot!

Lorenz

>As for ipo - so the compiler inlines functions which are referenced by modules - did I understand this correctly?

Yes, essentially. The problem comes about not from the inlining itself, but from the fact after inlining everywhere in the "python" executable, it sees no calls to the function and therefore eliminates it. The use of the -export-dynamic switch at link time is supposed to prevent this. When it's fixed, it should still do the inlining but leave the symbol intact and dynamically visible.

I'll post here when a fix for that is available, in the meantime you should be able to build without -ipo. If you want to use "-fast" without "-ipo", see "icc -help" for the definition of "-fast" and you can use those options explicitly for now, leaving out "-ipo".

Dale

OK, thank you!
And well, I need precise floating point numbers for numerical applications, so I'll have to live with the loss of optimizations by -fp-model precise and -fp-speculation=safe... - You can't have everything, right? (Pity, though :D)
Now I tried the following:FLAGS="-O3 -mssse3 -no-prec-div -static -xSSSE3 -opt-multi-version-aggressive -vec-guard-write -opt-malloc-options=1 -opt-calloc -mkl -openmp -fp-model precise -fp-speculation=safe -mp1"./configure --with-computed-gotos --without-gcc CC=icc CXX=icpc CFLAGS=$FLAGS AR=xiarmake CC=icc CXX=icpc CFLAGS=$FLAGS AR=xiarI got this:http://bpaste.net/show/5035/(click "raw" for the actual output).
Any idea what went wrong here?I played around with omitting several flags, and suddenly it started telling me that my c compiler was broken (couldn't find a shared library), so I restarted with a clean source tarball (I got the recent version, 3.1.2, instead of 3.1.1) and it stopped complaining. It didn't change anything about the above error, thought. Stuck again...

Looking at the raw output, the problem seems to be an undefined reference to __kmpc_begin, which is probably because you built objects with -openmp but didn't link with it.

icc -DNDEBUG -g  -O3 -Wall -Wstrict-prototypes   Parser/acceler.o Parser/grammar1.o Parser/listnode.o Parser/node.o Parser/parser.o Parser/parsetok.o Parser/bitset.o Parser/metagrammar.o Parser/firstsets.o Parser/grammar.o Parser/pgen.o Objects/obmalloc.o Python/mysnprintf.o Parser/tokenizer_pgen.o Parser/printgrammar.o Parser/pgenmain.o -lresolv -ldl  -lutil -o Parser/pgen
Parser/pgenmain.o: In function `main':
Parser/pgenmain.c:(.text+0x2c): undefined reference to `__kmpc_begin'
make: *** [Parser/pgen] Error 1

Did you add openmp code to python? I don't see any omp pragmas doing a quick search myself, so I don't think there's any point to having -openmp on the command line.

What worked for me was just setting CFLAGS to "-xHOST -O3 -no-prec-div -static", but as you said you may need to set fp-model appropriately for your needs. If I were you I'd start with -O2 or -O3 and work up from that, rather than having a long list of options that you think you might need. Then with each change (if it works) measure the effect to see if it matters. If it doesn't work, please let us know so we can investigate it. If it does work and it makes a difference, that would also be useful information.

I must confess I don't know exactly how the fp-model setting would affect calculations in python. Are floating point calculations simply passed to C functions? If so then I can see where it could affect the results you get when running a python program that does fp calculation, but it would be good to verify the effects with some fp test suite of python code. If you can illuminate me on that it would be appreciated.

Thanks!
Dale

Hi,uhm the OpenMP thing seems to have somehow slipped in, complete nonsense. Thank you for that ;)For the fp precision, I ran some included self-tests (make test) and lots of them failed for either fp precision or incorrect handling of NaN.And I not only forgot linking against OpenMP, but also MKL, so I left that one out for now, too.Then, the build finished cleanly, though with some fun compiler warnings like this one:/.../pyicc312/Modules/pyexpat.c(1768): warning #1419: external declaration in primary source filePyMODINIT_FUNC MODULE_INITFUNC(void); /* avoid compiler warnings */:DFlags were:-O3 -mssse3 -no-prec-div -static -xSSSE3 -opt-multi-version-aggressive -vec-guard-write -opt-malloc-options=1 -opt-calloc -fp-model precise -fp-speculation=safe -mp1I also did a build with only -O3 right in the beginning...I compared the above build to a gcc build (Py 3.1.1, just -O3 -msse3), and gcc was around 2% faster, but gcc wasn't limited in optimizations in floating point numbers. Details:http://bpaste.net/show/5037/Fun fact: icc is 7% faster in comparing floats, even if its floating point optimizations are disabled^^I am currently running the self tests to see how big the difference in fp precision actually is, but these take awfully long...

OK, test results are in:305 tests OK.4 tests failed: test_cmath test_ctypes test_math test_memoryview27 tests skipped: test_codecmaps_cn test_codecmaps_hk test_codecmaps_jp test_codecmaps_kr test_codecmaps_tw test_curses test_dbm_gnu test_dbm_ndbm test_kqueue test_nis test_ossaudiodev test_pep277 test_smtpnet test_socketserver test_sqlite test_startfile test_tcl test_timeout test_tk test_ttk_guionly test_ttk_textonly test_urllib2net test_urllibnet test_winreg test_winsound test_xmlrpc_net test_zipfile646 skips unexpected on linux2: test_dbm_ndbm test_ttk_guionly test_tcl test_tk test_ttk_textonly test_dbm_gnuI only tested the four failed ones with the gcc compiled python, and all of them were OK.This is what went wrong:test_math: self.assertTrue(math.isnan(math.atan2(0., NAN)))AssertionError: False is not TrueAssertionError: 1.1102230246251565e-16 != 0.0test_cmath: self.assertTrue(math.isnan(phase(z)))AssertionError: False is not TrueAssertionError: acos1004: acos(complex(0.0, nan))Expected: complex(1.5707963267948966, nan)Received: complex(1.5707963267948966, 0.0)Received value insufficiently close to expected value.test_memoryview:AssertionError: array('i', [97, 98, 97, 98, 97, 102]) != array('i', [97, 98, 97, 98, 99, 102])AssertionError: array('i', [97, 98, 97, 98, 97, 102]) != array('i', [97, 98, 97, 98, 99, 102])AssertionError: array('i', [97, 98, 97, 98, 97, 102]) != array('i', [97, 98, 97, 98, 99, 102])AssertionError: bytearray(b'ababaf') != bytearray(b'ababcf')AssertionError: bytearray(b'ababaf') != bytearray(b'ababcf')AssertionError: bytearray(b'ababaf') != bytearray(b'ababcf')test_ctypes:AssertionError: 15.0 != 21AssertionError: 15 != 21AssertionError: 133 != 139AssertionError: 0.333333333333 not less than 0.01In test_cmath, z is one of the following (for z in complex nans:):complex_nans = [complex(x, y) for x, y in [ (NAN, -INF), (NAN, -2.3), (NAN, -0.0), (NAN, 0.0), (NAN, 2.3), (NAN, INF), (-INF, NAN), (-2.3, NAN), (-0.0, NAN), (0.0, NAN), (2.3, NAN), (INF, NAN) ]]I have no idea as to what the hell went wrong, but obviously something did ;)Cheers

OK, I'll try running "make test" and see what I find. fp comparisons can be a real pain.

Thanks!

Dale

Yeah, I see some fails in my icc-built python. Let me investigate and get back to you.

Dale

First problem, the patch I used for ffi64.c (which I got from http://software.intel.com/en-us/articles/build-firefox-35-with-intel-c-c...) has a bug, where it does the post increment of ssecount twice instead of once. I don't know if you have the same problem or not, but you might want to check that. I still have other failures, so I'll continue investigating.

Dale

BTW, Feilong has fixed the problem on http://software.intel.com/en-us/articles/build-firefox-35-with-intel-c-c..., so if you look there for the problem, you might be confused as to what I'm talking about. All better now.

Dale

Hm yeah no I used the old version. Fixed it and ctypes-tests are no longer going wrong ;) The other ones still aren't fixed, naturally. Thanks for keeping at it!

Maybe I'm missing something, but it seems to me there's a problem with the python build. Regardless of whether I set CFLAGS in the environment or on the makefile line (or both) it doesn't get passed to setup.py properly, so when I set "-fp-model precise" the modules still get built without it. If I run setup.py by hand and set CFLAGS in the environment, then it seems to work, though it still adds it's own (e.g. if I set "-O0", it calls "icc -O3 -O0").

Does anyone else know if maybe there's some other way I should be setting CFLAGS for the module buillds?

Thanks!

Dale

I just asked a friend who's pretty well into Python and this is what he came up with:"See the README. Set OPT to influence the optimization flags; set EXTRA_CFLAGS otherwise.", according to a guy called Martin von Lwis, he told me.In Misc/HISTORY it says on line threethousandfourhundredandsomething, under "What's new in Python 2.5 Alpha 1":- EXTRA_CFLAGS has been introduced as an environment variable to hold compilerflags that change binary compatibility. Changes were also made todistutils.sysconfig to also use the environment variable when used duringcompilation of the interpreter and of C extensions through distutils.And some three thousand lines further on:- On systems which build using the configure script, compiler flags whichused to be lumped together using the OPT flag have been split into twogroups, OPT and BASECFLAGS. OPT is meant to carry just optimization- anddebug-related flags like "-g" and "-O3". BASECFLAGS is meant to carrycompiler flags that are required to get a clean compile. On someplatforms (many Linux flavors in particular) BASECFLAGS will be empty bydefault. On others, such as Mac OS X and SCO, it will contain requiredflags. This change allows people building Python to override OPT withoutfear of clobbering compiler flags which are required to get a clean build.So looks like there is quite some tidiness in those makefiles, though it's a bit hidden...

Nuku, I'm no expert on computation but I've tried every possible combination that comes to my mind in order to build Python with Intel compilers in Ubuntu 10.10. Here's what I've tried:

./configure --with-gcc=icc--with-cxx-main=icc\

--prefix=/home/user/python\

OPT="-O3 -xHost"

./configure --with-gcc=icc--with-cxx-main=icpc\

--prefix=/home/user/python\

OPT="-O3 -xHost"

./configure --with-computed-gotos --without-gcc CC=icc CXX=icpc CCFLAGS="-O3"\

--prefix=/home/user/python

After configuring the installation I type make install and everything goes ok, at least no error messages appear. Nonetheless I still cannot get Python to be built with the Intel compiler.... here's what I get when I run the executable:

Python 2.6.2 (r262:71600, May 27 2011, 11:50:43)

[GCC 4.4.5] on linux2

When I should be getting something like this:

Python 2.6.2 (r262:71600, Jun 30 2010, 19:36:03)

[GCC Intel C++ gcc 4.1 mode] on linux2

Do you have any ideas about what I'm doing wrong. Thanks in advance,

Csar.

Leave a Comment

Please sign in to add a comment. Not a member? Join today