Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login
How to Hack APIs in 2021 (labs.detectify.com) similar stories update story
244 points by sharestuff | karma 130 | avg karma 10.83 2021-08-10 05:29:14 | hide | past | favorite | 92 comments



view as:

Interesting article. I for one absolutely hate single page web applications. Amongst other things, it out-sources a lot of the computational load from the business to the customer, and they have a habit of riding roughshod over a user's browser preferences, patterns, and desires -- leading to a proliferation of grease monkey scripts and umatrix use amongst technical users and I suspect frustration among everyone else. Moreover, as this article shows, rather than exposing a finite number of "user data input" devices, you have to expose literally all your APIs and make them secure and robust. I just inherently think that's a larger attack surface.

This would also depend on the implementation.

I also always have the same tick at the back of my head. I have put off learning/playing around with all SPA technologies because of this.

To me it looks like we are increasing the surface area for a attack - again this is a hunch, a kind of smell test.

But I also see that react etc are running away with all job offers.


I'm finding it hard to grasp the effective difference between an SPA and a traditional webpage when it comes to security. The only real difference is your Content-Type, right? SPAs usually serve more JSON, and traditional pages serve more HTML. "Oops I accidentally exposed the wrong API" is basically the same as "oops I accidentally failed to lock down /admin/orders". I think all of the examples in the article are not specific to SPAs or "APIs" at all, and can all occur on any public HTTP endpoint that does any kind of real work.

In theory yes, in practice I've observed many seasoned software engineers operate without being paranoid enough about the fact that APIs they publish can be hit by clients other than the one they designed.

In a server rendered site, the developers create pages with a complete model of the user interaction on their heads. On a SPA, the API tends to be data-driven, so the backend developers have no idea what users will access what data, and why.

There's nothing inherent about that. You can design a server rendered site using only generic data-driven pages, and you can design an SPA API with a concrete interaction model. But, for some reason, people tend to not do those.


Depends on your API, I think. API's are usually more flexible and powerful than the rendered pages that make use of them. At some point, we have to lock down the data. And in a server rendered multi-page website (henceforth just 'MPA'), that point is on each page, while for an SPA it's in the API.

In an SPA, we call the API to fetch x, y, & z to render it. In an MPA, we query the database to fetch x, y, & z to render it.

Since the query to the database is entirely server side, it does not need to be locked down in the same way that the API does.

Mentally, it's a lot easier to look at the /admin/orders page and say "should this user be able to see all this data that I can see is visible?". It's a lot harder to wrap your head around all the myriad ways that someone might call the API, and understand whether you've let anything slip through. With the MPA, those items that shouldn't be there should be pretty obvious, just by looking at the page and seeing what data is visible.

For an MPA, the data available to the client is all and only what is rendered on a page. For an SPA, it's all and only what is available through the API. You'll be looking at rendered pages frequently as you develop, but you will rarely (never?) be looking at the full possible output of the API.

Some people will lock down which queries can be called against the API in production. This can mitigate some of the issues here, and is probably good practice for websites that aren't intending to make their API available directly. Particularly GraphQL API's.


> These apps [single page web applications] also tend to feel snappier because page loads are not required for every request.

This isn't my experience at all, especially when network conditions are not great. The browsers has error handling and a progress bar. Single page web applications often have bad or no error handling and you have no idea if the request failed, the code errored or something else went wrong. You need to refresh the whole page if something goes wrong, which results in having to load a ton of JS every again, negating all possible savings in time and data usage.


Exactly. The only time I felt the supposed "snappiness" is for documentation websites (so mainly text) that pre-fetch all links.

I agree and I hate two things about SPAs.

- Works terrible in bad networks.

- And the thing I hate the most is the shifting of images, links when the page is still loading. But this may be a implementation issue.


"- And the thing I hate the most is the shifting of images, links when the page is still loading. But this may be a implementation issue."

Thats just bad design.


I suspected that too, but I do not know enough about these reactive frameworks to be 100% sure. But as a user of such apps, it is absolutely frustrating.

To stop jumping you need to explicitly size yet-to-load sections. Without JS this means giving images explicit sizing, with JS this means any section can be dynamically loaded and, therefore, jump. So good design takes into account dynamic loading and places size bounds accordingly.

Reactive libraries/frameworks don't explicitly make this worse or better, except their presence implies a high chance of dynamic loading and, therefore, more opportunity for bad design. In addition most component libraries fail to communicate /who/ needs to size a component and if it is ever dynamic. It really doesn't help that most 'official' examples fail to resolve these issues.


So at least in this aspect, bad design sense to have gotten a lot easier.

Something endemic to SPAs.

As is common when people rant about SPAs, your dislike (at least as written here) is not actually about SPAs but other things

> Works terrible in bad networks

Yes, software traditionally works shitty under bad network conditions unless the developer actively tests under bad network conditions or has previous experience of handling bad network conditions. This is as much true for anything developed ever that touches a network.

> the shifting of images

This is simply developers missing to add width and height attributes to their <img/> elements. This has been happening since the dawn of the <img/> element and is unlikely to disappear. Also has nothing to do with SPAs, same happens with server-rendered HTML.


> unless the developer actively tests under bad network conditions > This is simply developers missing

That's the whole thing. SPA = state. It requires a lot of dev time to properly handle everything. With stateless applications, you can simply refresh your browser.

The sluggishness is not only because of bad network conditions, but it's multiplied by the huge application that has to be sent over the network, application initialization, and the many subsequent network requests.


> The sluggishness is not only because of bad network conditions, but it's multiplied by the huge application that has to be sent over the network, application initialization, and the many subsequent network requests.

A "huge" application can be broken up with code splitting/dynamic imports. Initialisation can be seeded with serverside data or saved in browser storage between pages.

The only semi-unavoidable part is the "subsequent network requests", but even these can be sped up with caching, batching, etc.

> It requires a lot of dev time to properly handle everything.

But yeah, these things take effort


The network requests could be done with more intelligent apis

But if you take everything into account, you can also develop a really good native app.

This is not reality.


>Yes, software traditionally works shitty under bad network conditions

not everything is equally affected by bad network conditions, SPAs generally are very badly affected by bad network conditions, indeed what is a bad network condition for an SPA might be acceptable for a traditional static page.


SPAs can be built to work well offline. I've written them myself. There is nothing inherent in an SPA that make it poor at this, quite the opposite. SPAs have excellent tooling for offline use.

A SPA's network dependency or robustness totally depends on the product's design. Some types of applications lend well to offline first (anything where the user owns their content/data, todo, notes, documents, pictures, etc). Others are much more dependent on fresh data. Which pretty much means anything big enough it's unreasonable to replicate it to the device.

I'm a fan of "offline first" design and have been a proponent at various companies. To the point where I can build the feature in at very little additional cost if it is considered and decided in the design phase. Bolt-ons patterns are messy.

However, the reality is that very few customers see this as a significant advantage. Which means that it doesn't really translate to market success. If budget is the #1 priority I can't in good conscience advocate for offline first unless it's going to offer a significant win for the company somewhere.


I’m asking this from a genuine place of curiosity, because last time I checked a few years ago the answer was “terrible”. Has the “save a file locally and load it again later” story for SPAs improved any?

There is no universal local file system API, so "terrible" might be a reasonable description of this.

There is a Chrome-only local file system API.

Generally SPAs are limited to browser provided storage like IndexedDB.


I have implemented plenty of SPA (for years) that use the pattern of A) allow the user to download their current state as JSON/EDN and then B) allow the user to upload a new state from JSON/EDN file and continue from where they last downloaded their state.

This has been easy to implement for as long as I can remember, so not sure why'd you say it's terrible. What stopped you previous times?


Ok, so that’s kind of what I figured was the current state of things. Compared to being able to hit “Save” and have a local copy updated, that’s a pretty subpar workflow. I get why it’s like that (preventing a sandboxed website from being about to update files on the local filesystem) but...

Why do you write <img/> in authoritative tone? It's not 2000 anymore where we pretended XHTML or polyglot HTML is a thing. It's particularly odd to see that old cargo cult idiom (or, worse, with additional random spaces) used in a post lecturing users about HTML5-era SPA supremacy.

? What's wrong with XHTML? <img /> is clearer than <img> for anyone familiar with XML, and XHTML documents are easier to parse (e.g. can be processed with XSLT stylesheets).

Nothing wrong with XHTML per se (did an internal site using XSLT in early/mid 2000s), but XML/XHTML has been on the way out for the better part of this millenium on the web at least. If you're developing web content and/or browser apps, you should know HTML IMO, and XML is the least of your concerns. Not looking forward to apps mixing XSLT and JavaScript ;)

Can't stand "<bla />" though with that pointless/clueless space. The only place where I've encountered these are older JSP, FreeMarker, or Thymeleaf/Spring MVC apps (ugh).


JSX (React's default markup language) expects all tags to be closed, and self closing tags are valid. If you spend a significant time writing React apps (with JSX) then it becomes pretty second nature to write self closing tags. It's not exactly XHTML (I can't think of any other of XHTML's idioms it uses).

I don't know what you mean, self-closing tags are part of the HTML (5) standard: https://html.spec.whatwg.org/#self-closing-flag

"shifting of images"

So much truth, not once I was trying to click on some image that had link underneath, and it suddenly moved to some other place and I ended clicking something different.


A truly intolerable thing about SPAs and JavaScript is that regular HTTP caching of images and fonts had to be limited because JS APIs can be and are used for fingerprinting, driving the whole web thing ad absurdum.

Switching off JS/fingerprinting doesn't really help either since it'll just disproportionally benefit Google's stronghold on web analytics even more.


Fingerprinting is not just JS. Fingerprinting is possible just using CSS rules alone.

The bigger issue, which seems to be what you are complaining about seems to be the gaping hole in privacy provided by 3rd party storage/resources. That is not a particular problem with SPAs, and can even be exploited without JavaScript.

I'm not sure where you're coming from regarding Google and web analytics. When 3rd party storage is gone (or partitioned) it sounds like everyone would be on the same page in terms of what data they can collect.


One of my real pet hates is software developers who assume everyone's running their software on an excellent internet connection. Badly written SPAs are the worst offenders but I also pretty much gave up on any sort of regular gaming because pushing these enormous 10-20 GB updates over a sluggish connection just became insufferable. Add that to the constant and shameless fleecing of customers that's apparently the norm now and the enjoyment to effort ratio is just too low to bother with.

> One of my real pet hates is software developers who assume everyone's running their software on an excellent internet connection.

That could be said for a lot of assumptions developers make. Everyone has 32GB of ram, everyone has an SSD, everyone has an i7...

It is an old problem, but like almost everything else in computing for some reason it seems to have become much worse since about 2010.


The spread of capabilities is bigger now.

In 2000 a developer might have been developing on a Pentium 3 with 128 MB of RAM, but they could reasonably expect their audience to be using at least a 486 with 16 MB of RAM because that was the minimum spec for IE4.

Now you're stuck with trying to impress people with a Ryzen Threadripper and 64GB of DDR5, but your webapp still has to support everyone's iPhone 7 (with 2GB to share with iOS and everything else they have running) for as long as Apple does.


Apple still supports the older iPhones. For example, the iPhone 6 that doesn't get the latest iOS version anymore (stuck with iOS 12) is still supported by regular security updates (the last iOS 12 update was 54 days ago).

Apple supports (at least security wise) probably more devices than you think.


What I think is even more different is that someone with the 486 in 2000 was used to the idea that they wouldn't be able to run some software, but unable to run a website? Unheard of, it's just broken.

I was working on a proposal for a client who wanted to build a marketplace, but most of the vendors had low-end tech equipment, and he went with someone who had a lower quote. My biggest caution was that a limited number of developers actually understood what to do with slow internet connections and old tech. Marketplace failed... ignorance sometimes is the worst thing in people who believe that it works on my machine and will work on everyone else's.

> One of my real pet hates is software developers who assume everyone's running their software on an excellent internet connection

Maybe it's a process problem, not a developer problem. Like management prioritizes a dozen analytics trackers and ad partners without any tooling for performance testing on a range of devices.


I guess my language was a bit imprecise, by "software developers" I meant "people involved in the software development process" rather than the specific role of the person writing the code. I guess I should have said "software companies" or "companies developing software". Managers definitely deserve their share of the blame for demanding user-hostile bloat too, likely more blame than the developers.

On the developer end (as a developer myself) we definitely deserve some of the blame for embracing things like Electron with such zeal in my opinion. I don't care how much memory a developer's workstation has, there's still a lot of hardware in use that can't take the bloat. I'm not saying everything has to be a native app written in vi against an original print of The C Programming Language as Brian Kernighan and Dennis Ritchie wrote it in 1978 or it's automatically shit, but something like React Native for the desktop would be far less horrible in terms of resource usage I reckon.


It could be that the customers most likely to pay are on more recent hardware. So why bother catering to everyone else if there is no return on investment.

Or for free software, chances are it is add supported. So push as many ads and trackers until the churn rate gets too high or competitors take market share.

Or more generally, maybe the design just follows the money.


Could also be reverse: why pay for software that won't run on your computer anyway?

Absolutely, that's a calculated decision that companies make. It is fully expected that a certain market segment is not interested in being a customer at a given price point. Market segmentation is often done by OS version or device hardware profile.

An SPA requiring a backend connection is quite simply a distributed computer and all the usual caveats about those including network availability apply.

Counter point: Often I am faced with doing it right or doing it within budget for an activation that is designed to live for a few weeks only. I aim for 100% compatibility with different hardware, OS, browser, user experience, client expectation and non suicidal business practices.

> One of my real pet hates is software developers who assume everyone's running their software on an excellent internet connection.

It's not the developers' job to assume anything about users. It's the project managers'.


> the code errored or something else went wrong. You need to refresh the whole page if something goes wrong, which results in having to load a ton of JS every again, negating all possible savings in time and data usage.

Not to mention that some SPA doesn't maintain state, and all the sudden one need to jump through the the process all over again. Personally SPA seems to try reimplement a lot of browser feature all over again in client side JS.


I think it's just like anything that's gotten popular. SPAs are all over the place now, which means there are plenty of bad implementations along with the good ones. But generally you only notice the bad ones due to frustration. This isn't SPA-specific IMO. I've seen plenty of server rendered pages/sites that were also poorly done back when those were more the norm. Heck, I still help maintain a set of CGI scripts that are horrible and slow.

Yeah, especially when it's multiple sequential loads, so instead of a large HTML blob it takes forever. Usually with 270ms+ if it's US East (from Australia).

Infinite scroll pages are the worst with this..

You scroll and scroll, and scroll, and every time you reach some level "down", another section is loaded, then at one moment it stops. Something somewhere fails, no more new sections, no way to continue from that point, only a full refresh and a huge scroll down.


Which wouldn't be nearly as bad if they gave you an offset parameter, or updated the offset parameter in the URL which they almost never do.

Hello Patreon.

Exactly.

A flawless SPA backed by a flawless API that produces responses in tens of milliseconds is superior to the old ways.

But a trash SPA backed by an API that produces responses eventually if ever and requires me to open the browser's developer tools to find out what happened? You can keep it. Old-school frameset sites are better than that.


One of the best resources out there to learn about API hacking has to be OWASP Juice Shop https://owasp.org/www-project-juice-shop/

We ran it as an exercise at work to learn about security vulnerabilities, and it was great fun and highly educational. Really recommend it.


Yeah I can confirm this. We did it in our company too and it was really fun!

> did it in our company

How much time did it take? (for one person to do it)


I'm intrigued, could you elaborate on how you ran it? Was it a one-day event, or something people did in their 10% time, or something else?

Sure, we ran it over a couple of months. Generally, people did it in their spare time outside of office hours, or in 10% time. As long as it didn't impact work delivery. Everyone installed it locally using docker, and then we had a centralised server that ran the scoreboard for everyone to share and add their capture the flag tokens. We did a couple of presentations about interesting solutions as well to drum up support.

We did a write-up on our blog if you're interested: https://purplelabs.eagleeye.com/blog/the-hackathon-capture-t...


Thank you!

The moment the author mentioned SPAs as feeling snappier, you could predict that would be the main topic of discussion in the HN comments.

Yeah it’s disappointing, really.

I hate SPAs but I like APIs as I can build my own programs that add missing features to make my life easier.

>Baaackkk iiin myyy dayyyyy APIs were not nearly as common as they are now. This is due to the explosion in the popularity of Single Page Applications (SPAs). 10 years ago, web applications tended to follow a pattern where most of the application was generated on the server-side before being presented to the user.

I know it's nitpicking and not the point of the article, but I don't think that's true. APIs have become commonplace because of the rise of mobile apps, that need one to talk to the back end. SPAs are in turn a response to the universality of APIs, because single page apps let you handle the browser as one more of those N clients, and avoid maintaining dofferent points of entrance to your backend.

I bring this point up because people seem to hate SPAs (usually for good reasons) and it's important to realize the problem they're solving. Better handling of complex client side logic and the like, which are usually pointed at as the main benefits of SPAs, are usually just an afterthought by companies, since the percentage of SPAs that reach a level of complexity where that's an issue is relatively quite low.


Eh? Couldn’t you just as easily make a traditional web app that’s just another client? Why does it need to be an SPA to use an API?

My company is distributor of a document-management-software, which also provides an API which connects directly to the underlying database. It's a very basic one i would never let touch the internet directly. 2 weeks ago i saw that another partner did connect it directly to the web with an angular frontend, exposing the login credentials (baked in the frontend). So their database is basically open to the world.

I told them, but they responded with something like "We are currently unable to change that. There are plans to change that in the future though".


Wait... you have nobody in charge of security in your company?

Kids these days have it so easy. It used to be you needed to scrape html pages and submit forms to get at someone else‘s data.

Now all you need to do is grab their GraphQL endpoint from devtools and you‘re off to the races.


The other big push for APIs comes from mobile apps, which also tend to follow a model close to the web's single-page applications. Would be good to have a reliable setup to do this from iPhone and Android devices.

> Would be good to have a reliable setup to do this from iPhone and Android devices.

I'm curious what you mean by this "reliable setup" part. Can you elaborate?


The part of the article where the researcher explains how to inspect and spoof API traffic (it's very web centric)

I'm learning to build front end with Vue.js and it supports SPA with client side rendering with multiple routes. I intentionally choose this to take advantage of cheap (free) static site hosting, and the learning curve is easier without having to think about server management.

I didn't realize until this project that a dynamic, reactive site can use static hosting.

Vue has been a pleasure to work with. It supports server side rendering too but do far I haven't needed it.


Once you get more comfortable with Vue, I suggest you start looking into making the project a PWA so that it may work offline and can be "installed" by the user (if it suits your use case).

I've written a fully statically served, backendless PWA, that uses the browsers local storage for persistent data, meaning that the data is fully private and owned by the user. If a user needs to sync their data between clients, they can connect the app to their Dropbox or Google Drive. There's even a chat implemented with WebRTC which allows users to communicate directly with each other, without a backend.

If you disregard the JS haters on HN, you'll realize you can do some amazing things in the pure browser runtime nowadays.


Interesting, that must be what I installed to get the Vue docs as an app. I hadn't heard the term PWA, thanks for sharing.

Not sure why this author thinks this is all so recent. I hacked my first web API in 1998 and it was exactly like an API for an SPA but they didn't call it REST back then. The content type was multipart/form-data and the results were sent in formatted html, but other than that it was still a series of url endpoints with an authentication header and input document.

Sure, your average static content site didn't use an API in 1998, but WordPress/Drupal sites have exposed poorly secured APIs since the early 2000s even if the standard front-end didn't use them for content display.


When I was 11 I used to click the big goofy button in Netscape Navigator that would view source and then I would hunt for passwords to logins. Hacking is for the low IQ smoothbrain thats why slavs do it all day ere day sorry not sorry get off my google analytics you scum and then you can earn my respect.

Your web API hack may predate this author..

XMLHttpRequest wasn't introduced (in IE!) until 1999, and almost nobody was using until after 2000. So how did you do make that work without a client side request mechanism?

ActiveX is from. 1996, and java applets are from 95. I'm not sure if those technologies were capable of much back then, but the web used to be a much more diverse place before traditional plugins got purged and replaced by Javascript; I'd argue that javascript would be the worst way to do any kind of interactive website back in those days, because almost everyone was on Windows and the alternatives were so much more useful.

The browser isn’t the only consumer of an api, they are called server to server as well (and this was particularly common when they couldn’t be called directly from the browser). Headlines from other sites, weather, hit counters, all that great late 90s web innovation.

XMLHttpRequest was part of a COM component (or maybe its own?) and was therefore able to be embedded in any Visual Basic program or Access or FoxPro front end. To be able to make calls to the internet invisibly, without a web browser? In one of those classic 90s Windows UIs? It looked and felt like magic.

Well I suppose breaking security is one thing, in most cases webpages don't really see to realise how much access they've already given you though. Things you'd normally need to scrape the webpage for become a lot easier when you can just adapt the few HTTP requests you can easily identify with the network tab in most browsers.

Testing apis with Postman is really interesting and you can learn a lot just by inspecting requests with the browser’s developper tools. We’ve build a command line tool [1] to be able to simply add integration tests in our CI/CD pipeline for our website (that was difficult with just Postman or Selenium)

[1]: https://hurl.dev


Thanks for posting, this is interesting. It also reminds me of the file format and simplicity of the REST Client VS Code plugin: https://marketplace.visualstudio.com/items?itemName=humao.re...

> Instead, different components of the same page will update magically, giving it a similar feel to a native application. This model has also become more popular because ten billion ????????n n?????? different frontend JavaScript frameworks (React, Vue and Angular, etc.) have come into existence.

Seems like a real "wet sidewalks cause rain" story. The model was becoming more popular among developers and then they made frameworks seems more likely than the other way around.

Then the "ten billion" part. The three frameworks he listed dominate. Everything in "etc." is minor in comparison.

I find this meme about JS frameworks lazy. (Note: totally a web backend person, so I should be the first to do the teasing). There seem to be much more boring answers:

1. With some exceptions (ClojureScript, TypeScript, Elm, ...), everyone who writes for the browser is forced to write JS. On the backend, people can express different opinions by using different languages (Python, Ruby, Java, Elixir, ...). So people that would split themselves into language camps on the backend are more likely to split into framework camps on the frontend.

2. The frontend environment is constantly changing (browsers) so the code is going to be churnier. You can't just not upgrade like you can on the backend.

3. User interfaces are more iterative than backend stuff, so the code is going to be churnier.

4. All this churn makes a rewrite more accessible and tempting.

I just find it easier to believe that there's some difference in the constraints and freedoms of JS devs that causes them to make more (but not really that many more) frameworks, rather than there's some secret character flaw or something.


Excellent article - it kept my interest to the end. I bookmarked and google several references in the article.

Now I need to see an article on modern API development methods, including trusted frameworks.

Cheers


The top 7ee1 script kiddies in the world gather... let marvel at the wonder of the cracking of an SPA some poor underpaid subcontractor put together in a couple of weeks.

This thread definitely helped me to understand that there is a huge market demand for a Youtube series teaching how to write decent SPAs with React and also how to properly secure an API.

Leave a reply if you want to be notified when the content is ready.


Heh, I did some work for these guys a few years ago. Good to see they're still about.

Note that this article has an error.

"This section could contain anything, but at minimum it needs to contain some kind of user identifier and a timeout (iat)."

The iat claim is when the token was issued, not when it expires. The exp claim is when it expires.

See also https://datatracker.ietf.org/doc/html/rfc7519#section-4.1.4


Legal | privacy