For a typical profiling run, which takes about two hours to run, VTune takes around 50 minutes for the database conversion after the run completes before the results are posted. Checking 'top' on my shiny new 16-core Tigerton system, I notice that 15 cores are idle for 50 minutes while a single Java thread runs on a single core.
As the apostles of multi-threading, are you considering a multi-threaded version of VTune? Sure would be nice to see the results in 1/16th the time.
PS, PTU also uses single-threaded Java data conversion, but is faster.