Wanted to show off my little project which helps whith reverse engneering APIs used by various apps. It takes HTTP traffic capturewd by mitmproxy and generates an OpenAPI specification for a given REST API.
I have used it already on two apps and the results are good enough to write an alternative client or quickly automate some stuff.
Hilarious indeed! The first thing I thought of with this project is actually AirBnB, because the sort/filter/map view is so terrible and missing features. AirBnB captures data on a bunch of stuff, but doesn't make it possible to search for in the UI (ever want a property with a lake view or a sauna? AirBnB knows which ones have those things, but they won't let you look for them!)
AirBnB doesn't have an official API but changes the tags so often that scrapers people put up on Github go out of date quickly. Now I can run this whenever I want to have actual search functionality (instead of the hobbled crap available on the website) and ensure that whatever flavor of API is available on the website that day is easily queryable!
Easier to modify requests vs doing it using browser tools. The ability to search for the things I mentioned is actually there, but only via an undocumented url parameter that erases itself every time you pan the map. Doing it via REST calls is much easier than trying to do it in the UI.
What a fantastic idea! I have so many half baked things that some idiot (me) built without documenting the underlying API. This will make life so much easier
It does, but it will only generate schema descriptions for JSON endpoints. Whis means that the URL and method will appear in the spec, but not the response/request schema.
This is really incredible. With a rooted android phone and these tools, plus a couple others [1,2,3], you can get a skeleton to implement a backend for any app you want.
In many applications you can bypass built-in verifications with some Frida [1] code. It requires more effort to do so, of course, as you'd need to find the OpenSSL methods (with a script like this [2] and bypass the verification in there.
If you're really intent on getting it to work, downloading the binary, patching out the verification function and putting it back is also possible if you're root.
Can this be used to generate a REST documentation for your own frontend just by interacting with it?
This should be augmented via a crawler, that click everyclickable element recursively.
Totally, but you would need to do some manual cleanup and naming afterwards to make it more useful than just reading the source code. You could also for example use your integration tests if you have some to capture as much routes as possible.
of course the generated doc should be refined (e.g. filling missing types, error codes) but your lib would save us a lot of work and make the world a better place.
The relationship between actual utility/value and price is only vaguely correlated.
Many of the most useful things on earth can't be marketed, not because they're not worth the money but because people are extremely greedy for some kinds of domains and simultaneously are bad at realizing the impact on their lives.
E.g I have never spent a single dollar to access music despite being one of the few things in life that brings me intense joy
It's vaguely correlated because you don't value the work of others in general. This means that at some point in your life, others did not value your work and showed you that was perfectly acceptable.
I think using HAR captures is simpler for the end user than spawning mitmproxy as they don't require any installation and are extracted from the network tab of the browser devtools. Is there a reason why you didn't use them?
EDIT: I realized that mitmproxy can also get traffic from other devices like phones. Very cool project, I will think about modifying mine to support mitmproxy captures!
Very interesting! Would this also be able to determine what kind of auth (header tokens, cookies, etc) the APIs require or is that something you still need to detect manually?
this is absolutely insane!!! I understand capturing the REST api network part, is it then examining the request body, headers being sent back and forth to figure out the API?
From what I understand it’s also somewhat how JIT works in various JavaScript engines: observe the sorts of objects (which naively have the performance characteristics of hash tables) you see, and start defining static offsets for fields you observed. The JIT’d (fast) objects may morph over time as new fields are observed, but I’d imagine it’s a similar idea to creating documentation… “this object tends to have these fields, so just pretend those are the only fields it can have, until another request proves otherwise”, with similar guess/checking for their types/etc.
I thought about it, but it would be hard to distinguish between an enumerator and just static data. For example if you logged in with only one account it could classify the "username" field as an enumeration, because there is only one captured value.
This is awesome; I’m going to try it as soon as I get back to my desk. I’ve been working on trying to glue together tools to translate Charles proxy output to OpenAPI (swagger). I think it would be a great tool to have in a web app reverse engineering toolbox.
I did something similar a year ago at the company which I work, I basically wrote a middleware that intercepts all the requests(express JS) and writes to a OpenAPI YAML file. It diffs previous requests to see which parts of the request path could be variables. The system isn't perfect but you are 95% there which is better than having no documentation or to hand write documentation or keep that spec file updated with changes that people introduce in the code. (got promoted to tech lead after this :-) )
Roughly speaking:
WSDL is to XML web services as OpenAPI is to REST
They both model the API and message structure of an API. AFAICT WSDL goes a little farther in that you can declare message sequences (I might be giving short shrift to OpenAPI here).
Hi, I would also like to add another tool I'm contributing to at work (cisco) called APIClarity [1]. It aims at reconstructing swagger specifications of REST microservices running in K8S, but can also be run locally.
This is a challenging task and we don't support OpenAPI v3 specs yet (we are working on it).
Feel free to have a look, and get ideas from it :)
This would come in very handy for codebases where an OpenAPI v3 spec would be welcome, but is too onerous to create by hand. Run this for a bit, have it spit out a nearly complete spec, and tweak it a bit to output the final product.
In fact, it is precisely what we did to generate the OpenAPI docs for NodeBB [1]. We had an undocumented API that we turned into an OpenAPI v3 file.
The question is maybe a bit off-topic a d vague. That's because I struggle to express it with the right terms:
I'm looking for a generic tool to build and then serve:
Accept Incoming request (API contract A)
Send outgoing request (API contract B) potentially with parameters from the incoming request
Receiving incoming response (API contract B)
Do some translations/string manipulation
Send outgoing response (API contract A)
mitmproxy (https://mitmproxy.org/) has scripting support that will let you do most of this.
For example, you can expose mitmproxy, listen to HTTP requests for a specific host (using this API: https://docs.mitmproxy.org/stable/api/mitmproxy/http.html), intercept the request, do whatever API calls you need, and inject a response without ever forwarding the request to the original server.
If you don't want to act like a proxy, you're going to approach this like a normal web applications that does HTTP requests using whatever HTTP client your framework of choice uses.
I've always wanted to build something similar to this, by reading HAR files captured right out of the devtools. Have you given any thought to that as an alternative input?
Is it possible to do this on wireshark/tcpdump pcap dumps? Like for finding out hostnames, endpoints and request packets of HTTPS requests that an android app is making?
The problem with pcap is that whe requests there would be encrypted and basically there is no way to practically decrypt them.
Mitmproxy solves that by being between the client and server and injecting it's own self-signed certificate (which you need to add to the trusted certificates on the phone, which requires root).
For Android you'll probably need root access (unless the app developer has opted in to loading your user-imported certificate authorities). For iOS this should be easier.
However, many apps apply cert pinning in production builds, which will require tools like Frida to disable them, which in turn requires root access/a jailbreak to function.
Alternatively, you could pull the apps from your phone without root (at least on Android), patch the most obvious cert pinning out (usually in the network manifest file) and install the new version.
step 2: features for training a language model on the request and response variables in the mitm stream and a shim for standing up a fully ml data driven zero code mock backend.
I have used it already on two apps and the results are good enough to write an alternative client or quickly automate some stuff.