Show HN: Mitmproxy2swagger – Automagically reverse-engineer REST APIs

alufers · on May 12, 2022

Wanted to show off my little project which helps whith reverse engneering APIs used by various apps. It takes HTTP traffic capturewd by mitmproxy and generates an OpenAPI specification for a given REST API.

I have used it already on two apps and the results are good enough to write an alternative client or quickly automate some stuff.

mhils · on May 12, 2022

mitmproxy dev here, very awesome! :) This seems to be particularly useful to quickly generate clients for reverse-engineered APIs.

mohsen1 · on May 12, 2022

Swagger Editor dev which now works at Airbnb here. This is hilarious!

SOLAR_FIELDS · on May 12, 2022

Hilarious indeed! The first thing I thought of with this project is actually AirBnB, because the sort/filter/map view is so terrible and missing features. AirBnB captures data on a bunch of stuff, but doesn't make it possible to search for in the UI (ever want a property with a lake view or a sauna? AirBnB knows which ones have those things, but they won't let you look for them!)

AirBnB doesn't have an official API but changes the tags so often that scrapers people put up on Github go out of date quickly. Now I can run this whenever I want to have actual search functionality (instead of the hobbled crap available on the website) and ensure that whatever flavor of API is available on the website that day is easily queryable!

metadat · on May 12, 2022

How will this let you search for a sauna?

SOLAR_FIELDS · on May 13, 2022

Easier to modify requests vs doing it using browser tools. The ability to search for the things I mentioned is actually there, but only via an undocumented url parameter that erases itself every time you pan the map. Doing it via REST calls is much easier than trying to do it in the UI.

anitil · on May 13, 2022

What a fantastic idea! I have so many half baked things that some idiot (me) built without documenting the underlying API. This will make life so much easier

lancebeet · on May 12, 2022

This is a really clever project. It seems like an obvious idea once you've seen it, but it clearly isn't. Thank you for sharing it.

ludovicianul · on May 13, 2022

This is great :) You can then fuzz your APIs for issues using https://github.com/Endava/cats.

upupandup · on May 12, 2022

does it capture route/server rendered pages too?

alufers · on May 12, 2022

It does, but it will only generate schema descriptions for JSON endpoints. Whis means that the URL and method will appear in the spec, but not the response/request schema.

nickysielicki · on May 12, 2022

This is really incredible. With a rooted android phone and these tools, plus a couple others [1,2,3], you can get a skeleton to implement a backend for any app you want.

[1]: https://github.com/koxudaxi/fastapi-code-generator

[2]: https://github.com/ioxiocom/openapi-to-fastapi

[3]: https://infosecwriteups.com/hail-frida-the-universal-ssl-pin...

andreidd · on May 12, 2022

That's interesting, but it won't work with native code that statically links a SSL implementation.

jeroenhd · on May 12, 2022

In many applications you can bypass built-in verifications with some Frida [1] code. It requires more effort to do so, of course, as you'd need to find the OpenSSL methods (with a script like this [2] and bypass the verification in there.

If you're really intent on getting it to work, downloading the binary, patching out the verification function and putting it back is also possible if you're root.

[1]: https://frida.re/docs/android/

[2]: https://mobsecguys.medium.com/exploring-native-functions-wit...

SemanticStrengh · on May 12, 2022

Can this be used to generate a REST documentation for your own frontend just by interacting with it? This should be augmented via a crawler, that click everyclickable element recursively.

alufers · on May 12, 2022

Totally, but you would need to do some manual cleanup and naming afterwards to make it more useful than just reading the source code. You could also for example use your integration tests if you have some to capture as much routes as possible.

SemanticStrengh · on May 12, 2022

of course the generated doc should be refined (e.g. filling missing types, error codes) but your lib would save us a lot of work and make the world a better place.

tomatowurst · on May 12, 2022

"...and we expect it to be free and open source as our budget for this is zero."

SemanticStrengh · on May 12, 2022

The relationship between actual utility/value and price is only vaguely correlated. Many of the most useful things on earth can't be marketed, not because they're not worth the money but because people are extremely greedy for some kinds of domains and simultaneously are bad at realizing the impact on their lives. E.g I have never spent a single dollar to access music despite being one of the few things in life that brings me intense joy

motoxpro · on May 13, 2022

I'm glad I can subsidize your music hobby and that you feel no sense of guilt for not supporting the people who "bring you intense joy"

tomatowurst · on May 13, 2022

It's vaguely correlated because you don't value the work of others in general. This means that at some point in your life, others did not value your work and showed you that was perfectly acceptable.

Labo333 · on May 12, 2022

Very nice!

On the same note, I wrote a program to generate Python code (requests) from a HAR capture: https://github.com/louisabraham/har2requests

I think using HAR captures is simpler for the end user than spawning mitmproxy as they don't require any installation and are extracted from the network tab of the browser devtools. Is there a reason why you didn't use them?

EDIT: I realized that mitmproxy can also get traffic from other devices like phones. Very cool project, I will think about modifying mine to support mitmproxy captures!

alufers · on May 16, 2022

Hey! Just writing to let you know that I've added HAR input support to mitmproxy2swagger.

Labo333 · on May 26, 2022

Wow that's super cool! Thanks!

olabyne · on May 13, 2022

Oh, I used a python script to generate pre-made requests from HAR recently, I'm pretty sure it was your git ! Very useful :)

Labo333 · on May 26, 2022

Thanks!

captn3m0 · on May 12, 2022

Almost exactly a fit against my idea[1] to generate OpenAPI from HAR files. Going to read through to see if I can add HAR support.

[1]: https://github.com/captn3m0/ideas#openapi-specification-gene...

efitz · on May 12, 2022

OpenAPI is just the latest version of swagger. Should not be hard to change.

I was able to translate HAR to OpenAPI with this web site's free preview: https://www.apimatic.io/transformer/

I also see others are working on the same thing: https://github.com/dcarr178/har2openapi

kaidon · on May 12, 2022

Also https://github.com/anbuksv/avantation

alufers · on May 16, 2022

Hey! Just writing to let you know that I've added HAR input support to mitmproxy2swagger.

jeroenhd · on May 12, 2022

Very interesting! Would this also be able to determine what kind of auth (header tokens, cookies, etc) the APIs require or is that something you still need to detect manually?

alufers · on May 12, 2022

At this point yes, but I am working on adding this.

upupandup · on May 12, 2022

this is absolutely insane!!! I understand capturing the REST api network part, is it then examining the request body, headers being sent back and forth to figure out the API?

alufers · on May 12, 2022

Yes, this is basically what this program does.

ninkendo · on May 12, 2022

From what I understand it’s also somewhat how JIT works in various JavaScript engines: observe the sorts of objects (which naively have the performance characteristics of hash tables) you see, and start defining static offsets for fields you observed. The JIT’d (fast) objects may morph over time as new fields are observed, but I’d imagine it’s a similar idea to creating documentation… “this object tends to have these fields, so just pretend those are the only fields it can have, until another request proves otherwise”, with similar guess/checking for their types/etc.

aleksiy123 · on May 12, 2022

Really awesome, I tried my hand at writing something similar and was surprised at how well it actually ended up working.

I feel liken the next step is automatically generating load tests and/or fuzzing tests. Felt like that could be a real product.

ludovicianul · on May 13, 2022

Here you go: https://github.com/Endava/cats

eligro91 · on May 12, 2022

Really amazing.

We're having hundreds of undocumented endpoints created over the years, and running this tool on our backends will create instantly good documentation

Thanks for that! Will give feedbacks if any issues

POPOSYS · on May 13, 2022

Can we have this as a browser dev tool please? F12 -> Tab REST -> Create spec from API

Divyeshkharade · on May 12, 2022

This looks amazing. Will it also capture data types like enumerators by someway detecting patters?

alufers · on May 12, 2022

I thought about it, but it would be hard to distinguish between an enumerator and just static data. For example if you logged in with only one account it could classify the "username" field as an enumeration, because there is only one captured value.

freedomben · on May 12, 2022

Yeah I imagine that is nearly impossible without capturing data at scale. Awesome tool! I'm super grateful :-)

efitz · on May 12, 2022

This is awesome; I’m going to try it as soon as I get back to my desk. I’ve been working on trying to glue together tools to translate Charles proxy output to OpenAPI (swagger). I think it would be a great tool to have in a web app reverse engineering toolbox.

evnix · on May 14, 2022

I did something similar a year ago at the company which I work, I basically wrote a middleware that intercepts all the requests(express JS) and writes to a OpenAPI YAML file. It diffs previous requests to see which parts of the request path could be variables. The system isn't perfect but you are 95% there which is better than having no documentation or to hand write documentation or keep that spec file updated with changes that people introduce in the code. (got promoted to tech lead after this :-) )

oneweekwonder · on May 12, 2022

little bit off-topic, but do anybody know of something similar for soap/wsdl? I'm aware of soapui mock service.

alufers · on May 12, 2022

Doesn't wsdl just expose the schema on the server?

efitz · on May 12, 2022

WSDL and OpenAPI/Swagger solve similar problems.

Roughly speaking: WSDL is to XML web services as OpenAPI is to REST

They both model the API and message structure of an API. AFAICT WSDL goes a little farther in that you can declare message sequences (I might be giving short shrift to OpenAPI here).

flatiron · on May 12, 2022

Short of “this requires oauth” I think you are right about openapi

klyr · on May 12, 2022

Hi, I would also like to add another tool I'm contributing to at work (cisco) called APIClarity [1]. It aims at reconstructing swagger specifications of REST microservices running in K8S, but can also be run locally.

This is a challenging task and we don't support OpenAPI v3 specs yet (we are working on it).

Feel free to have a look, and get ideas from it :)

We'll also be presenting it at next Kubecon 2022.

[1]: https://github.com/openclarity/apiclarity

sohaibtariq · on May 13, 2022

Try out https://www.apimatic.io/transformer/ for converting Swagger Specs to OpenAPI

julianlam · on May 13, 2022

This is great work!

This would come in very handy for codebases where an OpenAPI v3 spec would be welcome, but is too onerous to create by hand. Run this for a bit, have it spit out a nearly complete spec, and tweak it a bit to output the final product.

In fact, it is precisely what we did to generate the OpenAPI docs for NodeBB [1]. We had an undocumented API that we turned into an OpenAPI v3 file.

[1] https://docs.nodebb.org/api/read

Cilvic · on May 12, 2022

The question is maybe a bit off-topic a d vague. That's because I struggle to express it with the right terms:

I'm looking for a generic tool to build and then serve:

Accept Incoming request (API contract A) Send outgoing request (API contract B) potentially with parameters from the incoming request Receiving incoming response (API contract B) Do some translations/string manipulation Send outgoing response (API contract A)

jeroenhd · on May 12, 2022

mitmproxy (https://mitmproxy.org/) has scripting support that will let you do most of this.

For example, you can expose mitmproxy, listen to HTTP requests for a specific host (using this API: https://docs.mitmproxy.org/stable/api/mitmproxy/http.html), intercept the request, do whatever API calls you need, and inject a response without ever forwarding the request to the original server.

Alternatively, you could modify the request and then change the request destination, like in this example here: https://docs.mitmproxy.org/stable/addons-examples/#http-redi.... Using the WSGI support, you could even use normal Python annotations to build your own API without doing too much pattern matching: https://docs.mitmproxy.org/stable/addons-examples/#wsgi-flas...

Cilvic · on May 12, 2022

Ok. This sounds great for easy developing. But when I'm hosting this I'm not a mitmproxy. I want to act like a normal server/endpoint for API A.

jeroenhd · on May 13, 2022

I don't know any libraries for this in any good backend languages, but I've worked with these packages in NodeJS to do something like that:

- https://www.npmjs.com/package/http-proxy

- https://www.npmjs.com/package/connect

- https://www.npmjs.com/package/harmon

If you don't want to act like a proxy, you're going to approach this like a normal web applications that does HTTP requests using whatever HTTP client your framework of choice uses.

lsferreira42 · on May 15, 2022

Congrats, this is really awsome and i have a use for it right now, it will be really useful for debuging old and undocumented api's

andrewstuart2 · on May 12, 2022

I've always wanted to build something similar to this, by reading HAR files captured right out of the devtools. Have you given any thought to that as an alternative input?

alufers · on May 16, 2022

Hey! Just writing to let you know that I've added HAR input support to mitmproxy2swagger.

ducktective · on May 12, 2022

Is it possible to do this on wireshark/tcpdump pcap dumps? Like for finding out hostnames, endpoints and request packets of HTTPS requests that an android app is making?

alufers · on May 12, 2022

The problem with pcap is that whe requests there would be encrypted and basically there is no way to practically decrypt them.

Mitmproxy solves that by being between the client and server and injecting it's own self-signed certificate (which you need to add to the trusted certificates on the phone, which requires root).

resoluteteeth · on May 12, 2022

See SSLKEYLOGFILE

ducktective · on May 13, 2022

explain bit more please? Do you mean root is not needed? Isn't that a curl feature?

resoluteteeth · on May 14, 2022

Various browsers support it to log ssl keys which allows decrypting packet captures without requiring something like mitmproxy.

jwong_ · on May 12, 2022

Really neat! Gives me an idea on using something like this to generate e.g., CURL commands to mimic SSO flows.

Even just documenting an SSO flow as a diagram would be quite neat.

john-tells-all · on May 12, 2022

Note that for single resources, Chrome/Edge can do this now. There's a semi-hidden "copy this resource as Curl" option:

https://everything.curl.dev/usingcurl/copyas#:~:text=From%20....

When it works, it's effing magic! Spectacular for very quickly knocking out Bash scripts that test multiple APIs.

chrisweekly · on May 12, 2022

Awesome idea! Thank you for creating and sharing!

renewiltord · on May 13, 2022

This is great. Good example too since Airbnb could use with some improvement to the user chrome: include cleaning fees, etc

dnnssl2 · on May 12, 2022

Starred. Does this work with non-emulated iOS or Android http calls in which you may need to disable app level security?

jeroenhd · on May 12, 2022

For Android you'll probably need root access (unless the app developer has opted in to loading your user-imported certificate authorities). For iOS this should be easier.

However, many apps apply cert pinning in production builds, which will require tools like Frida to disable them, which in turn requires root access/a jailbreak to function.

Alternatively, you could pull the apps from your phone without root (at least on Android), patch the most obvious cert pinning out (usually in the network manifest file) and install the new version.

nattaylor · on May 14, 2022

I gave this a try today. It was silky smooth! Is it possible to tell Swagger to omit OPTIONS methods?

thefilmore · on May 13, 2022

This is one of the most clever projects I've seen in a while. Nice work.

dudus · on May 12, 2022

This is a great idea. Kudos.

dsfiguer · on May 13, 2022

Oh I love this so much! This would help me with scraping certain sites.

instagary · on May 12, 2022

How did you bypass cert pinning in the video for the Airbnb app?

alufers · on May 12, 2022

I didn't, just added a self-signed cert to my keychain on macOS and launched the app as downloaded from App Store.

I guess Airbnb doesn't use cert pinning.

paxys · on May 12, 2022

It doesn't have anything to do with mobile. The web client uses the same APIs.

andrewstuart · on May 13, 2022

Be interesting to run a fuzzer on the API whilst doing this.

mutant · on May 13, 2022

This is absolutely phenomenal!

difu_disciple · on May 13, 2022

This is fantastic. Thank you

BWStearns · on May 12, 2022

This is fantastic!

Sytten · on May 12, 2022

Super nice! We might integrate something similar in Caido proxy.

a-dub · on May 12, 2022

lol!

step 2: features for training a language model on the request and response variables in the mitm stream and a shim for standing up a fully ml data driven zero code mock backend.

useful · on May 12, 2022

bravo, I've wanted something like this

h1fra · on May 13, 2022

very nice !

mro_name · on May 12, 2022

awesome take