Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

As far as metadata versus data, the URL of a static image automatically discloses the image itself. The only way to claim that the history doesn't actually contain the image is if you assume that the site has gone defunct.

Unless, of course, you're willing to argue that a porn image stored on the local hard drive isn't contained in any folders on the same PC that soft-link it. You might have an interesting time trying to justify why it is contained in folders that hard-link it.



sort by: page size:

Exactly right. You could just rename a .torrent to .png and you'd essentially have the same thing. Granted, said png wouldn't be viewable, but neither are truly "hidden in plain site" which is what I think of when I hear "hiding data in an image file." To me, that evokes the idea of still having a viewable photograph that doesn't let on that it's hiding data within.

I also once contemplated making a browser extension, actually storing the url in metadata. I'm also not quite sure, how this affects user privacy, as the image content might be far more telling than the origin. Imo this compromises the origin of the file...

It looks like those are only links to content hosted elsewhere. I think directly storing images would be a lot more severe.

If the images are hosted on the site instead of a third party the list won't matter.

Good point, I should use relative links for the images, I definitely want to keep the ring open for HTTP, I'm only of those crazies that still takes his Windows 95 for a surf once in a while :D

The deception is really in the URL. If I see an address ending with .jpg, I expect it to load a raw JPEG, and I have some implicit expectations following that that are being broken if the site serves HTML instead.

Also, it's actually much simpler than the discussion makes it to be. Just look at the big picture. The user wants a picture, free of irrelevant or harmful bullshit. The site promises that raw image, by means of providing what looks like a direct URL. Then, it does not deliver on the promise, serving the irrelevant and/or harmful bullshit along with the picture. It's pretty clear that one side is trying to exploit the other one.


I am reasonably confident that such an image is "cached" by being part of an HTML document that itself was cached. I mean, the URL for the thing is the content: one would seriously hope that there isn't a URL->data cache somewhere holding that image ;P.

> The problem with that is that the originally uploaded filename is lost. At least without storing it in a separate database.

Sure, but that's a tradeoff nearly every website accepts because they just need the image itself. If you do want to preserve the original filename, is there a reason for not just keeping it in a database?


Thumbnails aren't metadata, they are a cache, since they can always be perfectly and deterministically recreated from the data.

But the actual image data isn't stored in the token, just its URL (usually). So what you own is not the image, but rather, whatever resource resides at a particular URL, with there being no guarantee that that resource doesn't also live at some other URL.

On the other hand, it seems that if you stick that ?dl=1 link into an <img> src, it still downloads and displays.

Surely it would be cached based on the image URL though, not based on the page URL, and different images could use different URLs. I still don't see how this would pose a problem.

Yeah, since it's sending the URLs, Microsoft couldn't get to the actual images in question because they'd in a private network, behind a login, etc. If the images are hosting on a public domain or IP, that is not on Microsoft.

If you needed such guarantees, I'd reckon you'd want to proxy through a domain name you control, anyways. I wouldn't trust the URL structure of a given image host to last decades, either. Or a given image host's hot-linking tolerance to stay the same.

This is a relatively static image that the browsers should be able to cache after the initial upload. Search results aren't static resources.

I don’t feel so concerned about images being sent to the server, but rather the information derived and collected about the images.

If you're asking why the image data might matter, here are concrete examples of non-public images that people might want to get their hands on:

* Earnings projections (and in general slides from non-public presentations that the user but not the attacking website has access to)

* Medical imaging

That's after thinking about this for 15 seconds. I'm sure there are many more in practice.

Or is that not what you were asking?


I always wonder how this kind of services show images. Do they hot-link them or they are saved in their own servers? One approach violates web etiquette and the only seems over engineering (save an image that maybe just one use is going to see).

1) I just downloaded the "The Mammographic Image Analysis Society database of digital mammograms" [0] and ran it against the tool [1] image by image. Results below, and I'll be sharing code soon (it's a mess of Python):

  true_pos 36
  true_neg 207
  false_pos 63
  false_neg 16
  total 322
2) How is it true when the site [1] says "We will not store your data on our server. Please don't worry about any privacy issues." when you can find all analyzed mammograms under the "static" directory?

http://mammo.neuralrad.com:5300/static/mamo.jpg

http://mammo.neuralrad.com:5300/static/mammo.jpg

(trying file names at random)

[0] https://www.repository.cam.ac.uk/handle/1810/250394

[1] http://mammo.neuralrad.com:5300/upload

next

Legal | privacy