Hacker News new | past | comments | ask | show | jobs | submit login

Yeah apparently it uses https://ocr.space/, deal-breaker for me.



I understand that hosted OCR, just like SaaS in general, is not suitable for every use case.

On the other hand, the OCR.space OCR API has a very strict privacy policy:

https://ocr.space/privacypolicy - All uploaded images and the extracted text are deleted immediatly after processing.


> All uploaded images and the extracted text are deleted immediately

Until they are served with a subpoena for a particular client, or a sweeping subpoena to store everything forever, or the company is sold and the new parent has different values, or the company decides to mine customer data for advertising uses, or there's a bug in the software, or there's a long-lived cache of the data, or it gets into their backups accidentally or deliberately, or they don't keep the data but keep "just" the meta-data, or they do statistics or analytics before deleting the data, or they are hacked, or they simply change their minds.

In terms of privacy, even a non-free non-open-source local app with DRM or license management is better than a server app with a "strict privacy policy". With a good firewall setup, you can be pretty sure that the local app won't betray you.


It doesn't seem reasonable to blame them for an arbitrary potential future when they're currently doing the right thing.


"The best way to avoid privacy breaches is not to formulate a detailed privacy policy; it's to reduce your capabilities so that you're unable to violate anyone's privacy."

http://www.daemonology.net/blog/2012-01-19-playing-chicken-w...


No, however the description of the plugin should make it clear data will be uploaded to a third party server for recognition so the user can make a choice about that.


It more or less does.

`For developers: Copyfish is published under the GPL open-source license. As OCR software, it uses the free OCR API from https://ocr.space/ .`


I don't find that clear at all. And this is also important to non-developers.

Also, for nearly all documents I ever need to scan, if they're important enough to require scanning, they're important enough that a third party should have nothing to do with them.

The majority of exceptions to the above being, ironically, documents without text, sketches, doodles, etc.


> when they're currently doing the right thing

You mean that we have to place some trust that they are. Some users cannot afford that kind of trust.


I suggest adding a big notification dialog that explains this when you first try to do an OCR request.


Why did you end up going with a .space domain? We blocked that whole TLD because we were getting massive amounts of spam from it when it first came out.


Oh dear. My main domain and email are in the .space TLD. I hope your practice is not widespread.

Personally, I chose .space simply because it's cool, cheap, and not overcrowded. It also seems to lend itself well to being part of a name.

I know spam is a hard problem, but I wish you wouldn't label me a spammer simply because of the TLD I chose.


That's one of the problems with cheap domains in the sub $5 range. Some gTLD registries (.space included) thought it was a good idea to offer them really cheap, but what they got were mostly spammers which puts you in a bad neighborhood.

There are a few others which you may want to avoid according to this report: https://securityintelligence.com/enticing-clicks-with-spam/


The author's full comment:

> Why did you end up going with a .space domain? We blocked that whole TLD because we were getting massive amounts of spam from it when it first came out.

From your comment:

> I know spam is a hard problem, but I wish you wouldn't label me a spammer simply because of the TLD I chose.

The author is not "labeling you a spammer". They're simply stating a fact about their experience. And in fact, it doesn't even mention you.


I'm not taking offence nor am I taking it personally. I was hoping my tone was clear on that (i.e. "I know [you have reason], but ..." and "I wish ...", which is just an expression of hope). Sorry, if it came off as aggressive.

I only tried to hightlight that they have, in effect, labeled everyone in .space (not just me, but me included) as a spammer.

It's heavy handed, but I understand there are sometimes pressing needs for quick solutions, like when having your mailboxes flooded with SPAM. Hence, the "I know ..." clause.


Why is it a deal breaker?


It's a deal breaker because THAT'S NONE OF YOUR DAMN BUSINESS, and that also goes for Copyfish. It smells fishy to me, and _promises_ never kept prying eyes away secret documents. People who handle confidential documents should never use SaaS. It's an issue of trust, and Copyfish deserves none.


Okay, don't use it then. They make no claims of enhanced privacy and frankly it's unreasonable to presume a service such as this would do all processing locally unless you're paying a premium for that ability. Or did I miss the "Great for confidential documents!" banner? For most peoples' use-cases, this is not a concern.


It's cheaper for a service to OCR locally than remotely.


There is simply no good OCR engine available that can run inside a Chrome or Firefox extension. The best available is Tesseract.js. And while this engine is fantastic as a project, its recognition rate does not come close to what is available server side.


I agree. There's also ocrad.js .


Mozilla should have made an effort to have that OCR code be able to be ran locally... not everything needs the cloud (well, almost nothing)


If you need a private OCR server that you can host yourself (locally or on the cloud), shoot me an email.


Did you create this account just to answer this question? I am curious.


Yes. Does it sound like shameless promotion to you? Maybe it is, but some people might have a need for this (and it's relevant to the topic/comment).


Promotion - Yes. Shameless - No. (This is what I think) I am just curious. I sure love the idea of this extension. If I need to use something like this I atleast know a handy extension for this now.


They sure did. I also have a sneaking suspicion that it's a9t9, the owner of copyfish and ocr.space.


Can you explain why this is a deal breaker? Is it the use of OCR or the choice of provider? Assume I know nothing here, because I do.


it adds a third party in the equation...




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: