Don't Trust Any Library, Not Even Your Standard Library

- go golang mitm goproxy performance pprof

For one of our projects, I was tasked to write a mitm proxy that will block the requests if checks done in another service fails. While we are a C++ house, popular proxies like nginx or squid are hard to develop, so I decided to use Go and its most popular mitm proxy library goproxy. Development was easy thanks to simple api and very well documentation of how to do necessary operations like creating necessary authority etc.

Feedback I got from first release candidate was baffling. Our user tests showed even for minimal incoming requests, proxy was using every bit of processing power, an unlikely outcome I thought as custom logic for proxy shouldn’t be that taxing for cpu and I was using the most popular mitm proxy. After confirming the problem, I added runtime/pprof to project and benchmarked it sending 1000 requests, with passing of two and a half minutes the culprit has shown itself.

      flat  flat%   sum%        cum   cum%
         0     0%     0%    158.32s 94.68%  net/http.(*conn).serve
         0     0%     0%    158.26s 94.64%  github.com/elazarl/goproxy.(*ProxyHttpServer).ServeHTTP
         0     0%     0%    158.26s 94.64%  github.com/elazarl/goproxy.(*ProxyHttpServer).handleHttps
         0     0%     0%    158.26s 94.64%  net/http.serverHandler.ServeHTTP
     0.01s 0.006% 0.006%    158.21s 94.61%  github.com/elazarl/goproxy.TLSConfigFromCA.func1
         0     0% 0.006%    158.20s 94.61%  github.com/elazarl/goproxy.TLSConfigFromCA.func1.1 (inline)
         0     0% 0.006%    158.20s 94.61%  github.com/elazarl/goproxy.signHost
     0.02s 0.012% 0.018%    153.28s 91.66%  math/big.nat.expNN
     2.21s  1.32%  1.34%    152.82s 91.39%  math/big.nat.expNNMontgomery
     0.02s 0.012%  1.35%    151.34s 90.50%  crypto/rand.Prime
         0     0%  1.35%    151.34s 90.50%  crypto/rsa.GenerateKey (inline)
         0     0%  1.35%    151.34s 90.50%  crypto/rsa.GenerateMultiPrimeKey         

To intercept a request, a mitm proxy needs to forges a new certificate for request’s host. As above statistics shows forging of new certificates is the big monster I needed to chase. After investigating goproxy’s repository, I have stumbled upon an api for caching certificates. With new fix, I had a better result.

      flat  flat%   sum%        cum   cum%
         0     0%     0%    21800ms 86.96%  math/big.nat.expNN
     260ms  1.04%  1.04%    21640ms 86.32%  math/big.nat.expNNMontgomery
         0     0%  1.04%    21600ms 86.16%  net/http.(*conn).serve
         0     0%  1.04%    21540ms 85.92%  github.com/elazarl/goproxy.(*ProxyHttpServer).ServeHTTP
         0     0%  1.04%    21540ms 85.92%  github.com/elazarl/goproxy.(*ProxyHttpServer).handleHttps
         0     0%  1.04%    21540ms 85.92%  net/http.serverHandler.ServeHTTP
         0     0%  1.04%    21500ms 85.76%  github.com/elazarl/goproxy.TLSConfigFromCA.func1
         0     0%  1.04%    21500ms 85.76%  github.com/elazarl/goproxy.TLSConfigFromCA.func1.1
         0     0%  1.04%    21500ms 85.76%  github.com/elazarl/goproxy.signHost
         0     0%  1.04%    21500ms 85.76%  main.(*CertStorage).Fetch
    5620ms 22.42% 23.45%    21300ms 84.96%  math/big.nat.montgomery
         0     0% 23.45%    20460ms 81.61%  crypto/rsa.GenerateKey (inline)

While this is better on paper, this is not an acceptable performance. Doing some googling for why rsa.GenerateKey would be slow revealed me crypto/tls: slow server-side handshake performance for RSA certificates without client session cache. Statistics shown at issue are quite similar to my results, so it was safe to say I hit bingo. The result I got with authority created with ECDSA is below (without key cache).

      flat  flat%   sum%        cum   cum%
     1.51s 42.78% 42.78%      1.54s 43.63%  runtime.cgocall
         0     0% 42.78%      1.54s 43.63%  syscall.SyscallN
         0     0% 42.78%      1.07s 30.31%  internal/poll.execIO
         0     0% 42.78%      1.04s 29.46%  crypto/tls.(*Conn).HandshakeContext (inline)
     0.01s  0.28% 43.06%      1.04s 29.46%  crypto/tls.(*Conn).handshakeContext
         0     0% 43.06%      1.04s 29.46%  syscall.Syscall9
         0     0% 43.06%         1s 28.33%  github.com/elazarl/goproxy.(*ProxyHttpServer).handleHttps.func2
         0     0% 43.06%      0.73s 20.68%  internal/poll.(*FD).Write
         0     0% 43.06%      0.73s 20.68%  net.(*conn).Write
         0     0% 43.06%      0.73s 20.68%  net.(*netFD).Write
         0     0% 43.06%      0.72s 20.40%  internal/poll.(*FD).Write.func1
         0     0% 43.06%      0.72s 20.40%  syscall.WSASend

Well, I can say case closed. But I am curious how long execution time for ecdsa.GenerateKey. Let me search for it.

      flat  flat%   sum%        cum   cum%
         0     0%     0%      0.03s  0.85%  crypto/ecdsa.GenerateKey
         0     0%     0%      0.03s  0.85%  crypto/elliptic.p256Curve.ScalarBaseMult

So yeah, you should use ECDSA instead of RSA in Go.