Have you guys seen mimalloc? https://github.com/microsoft/mimalloc has some interesting benchmark results for tbbmalloc for peak working set (last set of graphs on that page). When we first started using jemalloc on Linux and tbbmalloc on Windows, it was our experience that the peak working set with tbbmalloc was much worst and we had attributed this to the fact that we allocate on one thread and free on another. To ameliorate this, we had resorted to calling scalable_allocation_command(TBBMALLOC_CLEAN_ALL_BUFFERS) after every simulation time step. Reading the peak working set benchmarks on the mimalloc's readme.md it seems to suggest that tbbmalloc actually holds its own here with respect to jemalloc for work loads that do this (see the larsonN and mstressN results). However, the redis benchmark shows tbbmalloc being much worst than jemalloc. It might worth investigating the behaviour here.
PS. Benchmark is in a separate repo https://github.com/daanx/mimalloc-bench