Vertex shader / interpolator slow with OpenGL (SWVP or HWVP enabled?)

Vertex shader / interpolator slow with OpenGL (SWVP or HWVP enabled?)

The Developer Guide says, one should use shaders over fixed function pipeline. I'm yet trying to get close to the performance of the fixed function pipeline in OpenGL.

My GPU is a "Mobile Intel 4 Series Express Chipset Family" (DELL Latitude E6500) with latest drivers from Intel. I have done some experiments with my lighting shader to improve performance on mobile Intel GPUs. Normally, per vertex lighting should be much faster than per fragment lighting because the fragment color is just interpolated between the vertex colors. Therefore, you do as much work as possible in the vertex shader, because it is executed much less often than the fragment shader.

On my Intel GPU, it's different! It seems the vertex shader is much slower than the fragment shader. Especially passing varyings from vertex shader to fragment shader is a huge performance impact. My lighting shader works best if I do only the absolute necessary stuff in vertex shader, pass as less information as possible to the fragment shader and do everything else there.

As mentioned before, even per vertex lighting, where just the vertex color is passed to the fragment shader which only does "gl_FragColor = gl_Color;" is much slower than passing vertex and normal to the fragment shader and doing everything there.

I suppose the vertex shader is eigther a software solution or is by default in software mode. Your Developer Guide says in section 3.1:

Support for both Hardware Vertex Processing (HWVP) and Software Geometry Processing (SWGP) is included. SWGP is a superset of software-based processing that includes software vertex processing (SWVP). HWVP peak vertex throughput has been significantly improved in Intel GMA Series 4 twice as fast as the previous generation, and by default, HWVP is enabled.

But I think that is just for DirectX. Is there a way to improve vertex shader performance with OpenGL? Or am I doing something wrong?

Edit: My test scene is a sphere. There seems to be a hard limit where vertex lighting becomes slow. If I set the subdivision / segment count of the sphere > 32, vertex lighting fps is halved (I assume it's when the vertex data becomes > 32 kb). The resolution is 640x480 without multisampling. There are three directional lights in the scene which are rendered in a single pass by the shader.

I've tested three different systems:
- DELL Latitude E6500 Notebook (Core2Duo 2.8 GHz, GMA X4500MHD)
- 7 year old Nexoc Osiris E604 Notebook (Pentium M 2.0 GHz, ATi Mobility Radeon 9700)
- Gaming PC (AMD Phenom II X4 955 3.4 GHz, nVidia GTX 275)

Legend:

  • The three columns below a GPU mean: fixed-function pipeline / per vertex lighting / per fragment lighting
  • Values in brackets mean a multi-pass shader, rest single pass
  • Green indicates a shader that does as little as possible in vertex shader, red makes normal use of the vertex shader

Here are the results:

Sphere

seg.
Triangles
Vertices
GMA
X4500MHD
Radeon
9700M

GTX
275

fixed func

per vertex

per fragment

fixed func

per vertex

per fragment

fixed func

per vertex

per fragment

30

2142

1794

797
600 (323)

297 (231)

1009
977 (450)

382 (146)

2600

4900 (4900)

4900 (4600)

31

2260

1854

796
580 (311)

296 (230)

1005
969 (444)

380 (145)

2600

4900 (4900)

4900 (4600)

32

2382

1916

802
573
(297)

294
(227)

1007
967 (439)

379 (144)

2600

4900 (4900)

4900 (4600)

33

2508

1980

804
243
(140)

294
(111)

1006
960 (435)

379 (145)

2600

4900 (4900)

4900 (4600)

34

2638

2046

803
232 (131)

291 (107)

998
945 (428)

375 (143)

2600

4900 (4880)

4900 (4600)

37

3052

2256

800
200 (112)

287 (90)

992
920 (413)

372 (141)

2600

4900 (4880)

4900 (4600)

64

8396

4952

586
78(38)
211 (31)

892
613 (270)

340 (122)

2150

4900 (4850)

4900 (3330)

128

32720

17184

226
20 (10)
62 (8)

523
239 (83)

279 (87)

1300

4900 (2200)

4700 (1439)

256

66208

130512

63
5(2)
16 (2)

151
69 (21)

194 (36)

543

2410 (680)

2400 (419)

Edit2: Ok, I found this document. Is there a way to force the Intel driver to do HWVP? The option "Vertex Processing" in the driver's task bar app has no effect.

Regards

4 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Wow, good timing - I think I've just hit the same thing (see my other post.) My app is running at half the frame rate after conversion from fixed-function to shaders. If what you're saying is true, the vertex shaders could be running on the CPU.

I'm using a desktop Clarkdale part rather than mobile, with the latest drivers. The driver control panel offers three options for vertex processing: "application settings", which presumably applies only to D3D as GL doesn't have an API for this; "default settings"; and "enable software processing" which presumably in fact means "force software processing".

However, all three have the same effect on my app. The doc you found implies the driver will only force SWVP if it's on a specific list - perhaps there's a bug here? It's also four years old, is any more recent info available?

Did you contact/get a reply from Intel about this? A shame not to after you've put so much effort in!

No, I didn't get any response. The only advice that I can give to you is to make a single pass shader and do as little as possible in the vertex unit. Pass everything to the fragment unit (preferably as uniform, since varyings are slow) and calculate lighting there.

OK, thanks. I wanted to share as much GL code as possible between my iOS and Windows versions; but if I'm going to have to branch the code to work around this problem I might as well stick with the fixed-function pipeline on Windows, since I don't actually need any custom shaders. In fact I suppose I could still target OpenGL ES 1 for the iOS versions, assuming Apple won't drop support for it in the future.

Regardless, I'll try escalating this issue through Intel and see what they say...

Leave a Comment

Please sign in to add a comment. Not a member? Join today