Hardware acceleration of Special Functions.

71 post / 0 nuovi
Ultimo contenuto
Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione

Gussian filter in random sampling could lead to the better results than box filter.Gaussian curve falloff resembles more subtle changes in the brightness(colour) more like real-world varying fields of colour.The problem with the stochastic approach will be to choose the radius of blurring properly which could not take into account jagged pixels(area where the aliasing occures) because of randomness of sampling.

Gussian filter in random sampling could lead to the better results than box filter.

at low sampling frequency yes (though I'll prefer raised cosine or Lanczos 2 over Gaussian), but with adaptive sampling you have a high sampling frequency in high frequency signal areas thus the local radius of your reconstruction filter must be very small to avoid excessive bluring and you will missin this case the farthestsamples (to the pixel center) within the pixel area

pixels on your screen arerectangular areas after all so a box reconstruction filter makes more sense in practice (with supersampling) thansome theoretical texts may let you think when reasoning about only discretereconstruction samples

anyway this is user selectable, and, since you have to ask, the default look is probably notthat bad

@bronxzv
Slightly off topic question.In one of your posts you mentjoned that Kribi project has 500 timers based on rdtsc instruction.Is it possible to post the proper usage of rdtsc instruction in those timers.It could be posted as a code template.
Best regards
Iliya

@bronxzv Slightly off topic question.In one of your posts you mentjoned that Kribi project has 500 timers based on rdtsc instruction.Is it possible to post the proper usage of rdtsc instruction in those timers.It could be posted as a code template. Best regards Iliya

IIRC this was mentioned in a private post

unfortunately this isa closed source framework so I'm not allowed to post source code from it

thekey advantage isthat itmakes it easy to install nested stopwatches in our source code with a simple (single line) notation, then after profile runs it reports detailed number of cycles and % in a nicely formatted report, for examplewith indentation for inner timings

for the actual usage of RDTSC it's simply using the advices from the paper I posted the other day, nothing special or innovative there

Quoting bronxzv

integration. Even if FP-instructions on these microcontrollers cause "lightweight" Traps only ~3 clock cycles are needed to complete a vector fetch.

what was a "vector fetch" on such anancient purely scalar chip?..
In another words, every time whenan interrupt or trap occurs an address of some routine has to obtained
from a 256-entry vector table.

>>...do you knowhow many cycles were required foremulating basic fp instructions like FADD and FMUL?..

No. I just checked a 29K familty User's Manual and I have not found any technical details regarding "number of cycles to execute an instruction".

Best regards,
Sergey

In another words, every time whenan interrupt or trap occurs an address of some routine has to obtained
from a 256-entry vector table

Why it is called "vector table".This is simply a data structure which holds a scalar values not vector values.

In another words, every time when an interrupt or trap occurs an address of some routine has to obtained

from a 256-entry vector table.

so it's only (part of?) the time to branch to the trap hander, it tells us nothing about the speed of the actual routine

No. I just checked a 29K familty User's Man
ual and I have not found any technical details regarding "number of cycles to execute an instruction".

it was probably something like 50-100 cyclesforFP32 FMUL and FADD (based on my past experience writing FP emulation routines)

http://en.wikipedia.org/wiki/Interrupt_vector

Yes I know this.
My question was slightly different.Members of IDT(IVT in DOS)are addresses i.e single binary number representing an address in the memory.Judging by the definition of the vector each IDT's entry should have been composed from a few values(addresses),but this is not the case.
I do not know why Intel decided to call it a vector.

My question was slightly different.Members of IDT(IVT in DOS) are addresses i.e single binary number representing an address in the memory.Judging by the definition of the vector each IDT's entry should have been composed from a few values(addresses),but this is not the case.

ah I see what you mean, I have no idea why it's called a vector, I'll consider the whole table as a vector but noteach individual address as you said, unlike the common usage

It could be a vector when you consider the whole IDT.
Every IDT's "vector" point to the 8-byte descriptor which itself could be represented as a vector composed from various fields.

Quoting iliyapolak

In another words, every time whenan interrupt or trap occurs an address of some routine has to obtained
from a 256-entry vector table

Why it is called "vector table"...
I think AMDusesa "Vector Table" termbecause Intel calls a similar structure as an"Interrupt Descriptor Table".
It looks like this is a "War of Terms" and the same applies to Oracle and Informix, etc.

Quoting bronxzv...it was probably something like 50-100 cyclesforFP32 FMUL and FADD (based on my past experience writing FP emulation routines)...

It looks realistic. Why wouldn't we call to AMD and ask? :)

think AMDusesa "Vector Table" termbecause Intel calls a similar structure as an"Interrupt Descriptor Table

It was called IVT i.e "Interrupt Vector Table" and we are talking about the DOS and 8086 CPU.
Judging by the definition of the vector as a unit composed from the number of scalars.IVT can not be called a vector nor IDT.Because these structures contain a scalar member fields which point to descriptors(in the case of IDT).

Quoting iliyapolak

think AMDusesa "Vector Table" termbecause Intel calls a similar structure as an"Interrupt Descriptor Table

It was called IVT i.e "Interrupt Vector Table" and we are talking about the DOS and 8086 CPU...
Does it change an essence of interrupt or trapprocessing for Am29K microcontrollers?

It is not related to AMD 29k microcontrollers.The question is why scalar member is called a vector.

Quoting Sergey KostrovQuoting bronxzv...it was probably something like 50-100 cyclesforFP32 FMUL and FADD (based on my past experience writing FP emulation routines)...

It looks realistic. Why wouldn't we call to AMD and ask? :)

I've sent an email to AMD and this is a core part of my e-mail:

Q:
My question is related to a legacy RISC microcontroller Am29200. The microcontroller supports 18 floating-point instructions and
all of them are emulated using traps:

How many clock cycles are needed to execute FADD or FSUB instructions?

and here is a response from AMD:

A:
AM29xxx products have not belonged to AMD since 2005. You should contact Spansion for any questions on these products:

http://www.spansion.com/support/ses/ses.html

A:
AM29xxx products have not belonged to AMD since 2005. You should contact Spansion for any questions on these products

was not sure that AMD will answer your e-mail so fast.It looks like they sold their miucrocontrollers division to Spansion.

Quoting iliyapolak

A:
AM29xxx products have not belonged to AMD since 2005. You should contact Spansion for any questions on these products

was not sure that AMD will answer your e-mail so fast. It looks like they sold their miucrocontrollers division to Spansion.

Intel sold a similar division ( some CPUs for embedded platforms )to Marvell.

@Sergey

I think that we can come to conclusion that there is no such a thing as hardware accelerated special functions.

Pagine

Lascia un commento

Eseguire l'accesso per aggiungere un commento. Non siete membri? Iscriviti oggi