How many cycles do the new instructions requireand can they be paired with other intructions?
The AES-NI white paper has some performance results from which you could estimate the instruction latencies: http://software.intel.com/en-us/articles/intel-advanced-encryption-standard-aes-instructions-set/. It explicitly mentions that they are pipelined too.
You can expect pclmulqdq to perform the same as other vector multiplications.
it changes by impl, you can take your aes kernel(s) and run it through the CodenAnalyzer to understand tput, latency, etc http://software.intel.com/en-us/articles/intel-architecture-code-analyzer/