Yeah apparently it uses https://ocr.space/, deal-breaker for me.

a9t9 · on June 20, 2017

I understand that hosted OCR, just like SaaS in general, is not suitable for every use case.

On the other hand, the OCR.space OCR API has a very strict privacy policy:

https://ocr.space/privacypolicy - All uploaded images and the extracted text are deleted immediatly after processing.

cantrevealname · on June 20, 2017

> All uploaded images and the extracted text are deleted immediately

Until they are served with a subpoena for a particular client, or a sweeping subpoena to store everything forever, or the company is sold and the new parent has different values, or the company decides to mine customer data for advertising uses, or there's a bug in the software, or there's a long-lived cache of the data, or it gets into their backups accidentally or deliberately, or they don't keep the data but keep "just" the meta-data, or they do statistics or analytics before deleting the data, or they are hacked, or they simply change their minds.

In terms of privacy, even a non-free non-open-source local app with DRM or license management is better than a server app with a "strict privacy policy". With a good firewall setup, you can be pretty sure that the local app won't betray you.

bpicolo · on June 20, 2017

It doesn't seem reasonable to blame them for an arbitrary potential future when they're currently doing the right thing.

icebraining · on June 20, 2017

"The best way to avoid privacy breaches is not to formulate a detailed privacy policy; it's to reduce your capabilities so that you're unable to violate anyone's privacy."

http://www.daemonology.net/blog/2012-01-19-playing-chicken-w...

mnem · on June 20, 2017

No, however the description of the plugin should make it clear data will be uploaded to a third party server for recognition so the user can make a choice about that.

bpicolo · on June 20, 2017

It more or less does.

`For developers: Copyfish is published under the GPL open-source license. As OCR software, it uses the free OCR API from https://ocr.space/ .`

tripzilch · on June 21, 2017

I don't find that clear at all. And this is also important to non-developers.

Also, for nearly all documents I ever need to scan, if they're important enough to require scanning, they're important enough that a third party should have nothing to do with them.

The majority of exceptions to the above being, ironically, documents without text, sketches, doodles, etc.

m-p-3 · on June 20, 2017

> when they're currently doing the right thing

You mean that we have to place some trust that they are. Some users cannot afford that kind of trust.

stingraycharles · on June 20, 2017

I suggest adding a big notification dialog that explains this when you first try to do an OCR request.

JoblessWonder · on June 20, 2017

Why did you end up going with a .space domain? We blocked that whole TLD because we were getting massive amounts of spam from it when it first came out.

jolmg · on June 20, 2017

Oh dear. My main domain and email are in the .space TLD. I hope your practice is not widespread.

Personally, I chose .space simply because it's cool, cheap, and not overcrowded. It also seems to lend itself well to being part of a name.

I know spam is a hard problem, but I wish you wouldn't label me a spammer simply because of the TLD I chose.

treitnauer · on June 20, 2017

That's one of the problems with cheap domains in the sub $5 range. Some gTLD registries (.space included) thought it was a good idea to offer them really cheap, but what they got were mostly spammers which puts you in a bad neighborhood.

There are a few others which you may want to avoid according to this report: https://securityintelligence.com/enticing-clicks-with-spam/

ignoramceisblis · on June 21, 2017

The author's full comment:

> Why did you end up going with a .space domain? We blocked that whole TLD because we were getting massive amounts of spam from it when it first came out.

From your comment:

> I know spam is a hard problem, but I wish you wouldn't label me a spammer simply because of the TLD I chose.

The author is not "labeling you a spammer". They're simply stating a fact about their experience. And in fact, it doesn't even mention you.

jolmg · on June 21, 2017

I'm not taking offence nor am I taking it personally. I was hoping my tone was clear on that (i.e. "I know [you have reason], but ..." and "I wish ...", which is just an expression of hope). Sorry, if it came off as aggressive.

I only tried to hightlight that they have, in effect, labeled everyone in .space (not just me, but me included) as a spammer.

It's heavy handed, but I understand there are sometimes pressing needs for quick solutions, like when having your mailboxes flooded with SPAM. Hence, the "I know ..." clause.

superasn · on June 20, 2017

Why is it a deal breaker?

kebman · on June 20, 2017

It's a deal breaker because THAT'S NONE OF YOUR DAMN BUSINESS, and that also goes for Copyfish. It smells fishy to me, and _promises_ never kept prying eyes away secret documents. People who handle confidential documents should never use SaaS. It's an issue of trust, and Copyfish deserves none.

ghostly_s · on June 20, 2017

Okay, don't use it then. They make no claims of enhanced privacy and frankly it's unreasonable to presume a service such as this would do all processing locally unless you're paying a premium for that ability. Or did I miss the "Great for confidential documents!" banner? For most peoples' use-cases, this is not a concern.

mnem · on June 20, 2017

It's cheaper for a service to OCR locally than remotely.

a9t9 · on June 20, 2017

There is simply no good OCR engine available that can run inside a Chrome or Firefox extension. The best available is Tesseract.js. And while this engine is fantastic as a project, its recognition rate does not come close to what is available server side.

ocrcustomserver · on June 21, 2017

I agree. There's also ocrad.js .

awqrre · on June 21, 2017

Mozilla should have made an effort to have that OCR code be able to be ran locally... not everything needs the cloud (well, almost nothing)

ocrcustomserver · on June 21, 2017

If you need a private OCR server that you can host yourself (locally or on the cloud), shoot me an email.

kronos29296 · on June 21, 2017

Did you create this account just to answer this question? I am curious.

ocrcustomserver · on June 21, 2017

Yes. Does it sound like shameless promotion to you? Maybe it is, but some people might have a need for this (and it's relevant to the topic/comment).

kronos29296 · on June 27, 2017

Promotion - Yes. Shameless - No. (This is what I think) I am just curious. I sure love the idea of this extension. If I need to use something like this I atleast know a handy extension for this now.

drez · on June 21, 2017

They sure did. I also have a sneaking suspicion that it's a9t9, the owner of copyfish and ocr.space.

johnvonneumann · on June 20, 2017

Can you explain why this is a deal breaker? Is it the use of OCR or the choice of provider? Assume I know nothing here, because I do.

awqrre · on June 21, 2017

it adds a third party in the equation...