This is a guest post by Elazar Leibovich (Google+, Github). Many thanks for Ron, for giving me this stage.
Why not Fiddler2?
Ron once asked me how to cause a few files on the server to load more slowly, in order to test many patterns of page loading clients might see and debug css races.
Any idea how to selectively slow down transfer of some but not all files from a web server?
I want to simulate a “css race” that happens when csses are delivered slowly.
Eventually, Ron found a fiddler extension doing just that. But it lead me to the question – how would one tweak with traffic to his servers in general?
The obvious answer is writing a Fiddler extension, however I find this solution lacking. The usual complaints apply here: Fiddler is a heavy GUI app, it doesn’t run on Linux, it’s not open source. The main technical point is that Fiddler is not designed to run as a real proxy. Fiddler is designed to listen on :8080 on my box, for looking at the traffic I’m generating myself. However I often want more than that, for instance I want my QA team use a proxy that emulates a flaky mobile connection. I want to record the traffic my QA team generated, so that I can replay it, and use real user input for load testing.
I was looking for a framework, a-la ruby on rails, that would allow me to write custom HTTP proxies, and I found none. So I decided to write one (open source on github, of course).
A taste of GoProxy
GoProxy, is a Go library allowing you to build a custom HTTP proxy. By default you can create a basic transparent proxy, that would simply deliver the request from the proxy client to the remote server, and write the response back
proxy := goproxy.NewProxyHttpServer()
proxy is a net/http handler, start it just as you start any other Go HTTP handler
http.ListenAndServe(":8080", proxy)
How do we customize it? We’ll add “handlers”, request handlers and response handlers. The request handler will receive the request after it is received from the proxy client, but before it is sent to the remote server, and they return another (maybe the same) request. Here is a simple request handler that simply adds a header:
proxy.OnRequest().DoFunc(func(r *http.Request, ctx *goproxy.ProxyCtx) (*http.Request,*http.Response) {
r.Header.Set("X-Proxy","yxorP-X")
return r,nil})
Note that the request handler can return an HTTP response. If the HTTP response is not nil, the proxy will never send the request to the remote client, and return the handler’s response to the client.
The case of response handlers is almost identical:
proxy.OnRequest().DoFunc(func(r *http.Response, ctx *goproxy.ProxyCtx) *http.Response {
r.Header.Set("X-Proxy-Resp","pseR-yxorP-X")
return r})
Here we’re adding a header to every response sent back to the client, leaving the request untouched.
Inside the OnRequest or OnResponse function, we can use a RequestCondition or ResponseCondition accordingly. Those conditions are interfaces that are able to Handle a request or a response and return a boolean – their decision whether or not to activate the handler given to the DoFunc method right afterwards.
proxy.OnRequest(goproxy.IsLocalHost).DoFunc(func(r *http.Response, ctx *goproxy.ProxyCtx) *http.Response {
r.Header.Set("X-Proxy-Resp","pseR-yxorP-X")
return r})
This will add the X-Proxy-Resp header only to requests sent to local host.
Another aspect of HTTP proxy is the HTTPS connections. Generally speaking, when a proxy client wants to use HTTPS over a proxy, it sends the proxy the http request “CONNECT remotehost.com:443\r\n\r\n” and the proxy then will forward all bytes sent to it by the client to remotehost.com:443. We allow you to eavesdrop the connection with the same mechanism of the OnRequest request handler.
proxy.OnRequest(goproxy.ReqHostIs("example.com:443")).HandleConnect(goproxy.AlwaysMitm)
proxy.OnRequest(goproxy.UrlIs("example.com/")).DoFunc(...) // now we will handle encrypted traffic to example.com
I tried to include a decent documentation and examples in the source code, please let me know if anything is unclear.
One important note, since the project is still in its infancy, in a pre-alpha stage, I reserve the right to change and break the API occasionally, until I’ll figure out what’s the best API. I hope to stabilize the code in a few months, and ensure no breaking changes are introduced, but to do that I need users input.
Do visit http://github.com/elazarl/goproxy and let me what you think about the code, documentation & usage examples.
Key Take-Aways from The Project
Go
Google’s Go is a very pleasant language, which was just released. I have no doubt this project would have taken much more time in other languages.
There are many merits to the language itself, but I think that what really made the difference was the absolutely excellent standard library for HTTP connections and data streams. I’ve never seen such a nicely layered abstraction, allowing you to use http.Get(“http://example.com”) as a quick replacement for wget on the one hand, and on the other hand allow you to read part of the response and then drop the connection.
The go tool is an excellent plus. It does an excellent job with managing dependencies, however it is not as strong as maven. So for example, you must have internet connection when building a package for the first time, you cannot rely on a local repository. Moreover, it cannot manage the versions of software packages, ensuring all developers are using the same version.
A weak point of the language is immature parts in the standard library. For example, JPEG encoding performance was not adequate to re-encode many jpegs on the fly (check out the 40 LOC image flipper proxy!), lack of documentation to some parts of the library, and unstable eco-system. A certain library I was using which was mentioned in the official golang blog had changed its API in the middle of development (hey, I have no complaints for the author – he did an excellent job, but it’s important to know that such issues may exist). In another case a stray printf remained in a library causing weird printouts from time to time.
A peculiar benefit of Go, as I see it, is that it is a statically typed memory safe language, yet, it can produce a standalone executable, depending on nothing at all, and with fast startup time. There are some programming languages trying to do that, but which one of those have a profiler? They are compiled to C, Go compiles to assembly. (D might be an example for such a language, but I don’t know it enough to form an opinion).
Software design
The most important lesson I learned is, never ever design an API without writing real world useful software that uses it. I found most of the API gotchas while writing small examples.
The second important lesson is, as Joshua Bloch put it “when in doubt – leave it out”. Deleting code is the most effective way of improving it. Look at your API and try to see what you can delete without major consequences. Then delete it. Yes, you definitely have at least one function which is not needed, and is best left out.
The third lesson is, write documentation, try to cover a lot of the code. When explaining how to use your API, you’ll find out that the API doesn’t make a lot of sense.
The fourth lesson is, the API will never be perfect, you’d better release it and get feedback from real users, than endlessly “improve” it in the lab without getting input from real users.
(Heck, while writing it, I noticed I didn’t give any way to block CONNECT requests without MITM the https connection. Now it is fixed, but there are probably more problems to come).