Hacker News new | past | comments | ask | show | jobs | submit | melq's comments login

Estimating the amount of unique elements in a set and counting the amount of unique elements in a set are very different things. Cool method, bad headline.


They’re not very different things; the terms are used interchangeably in most contexts because in the real world all counting methods have some nonzero error rate.

We talk about ‘counting votes’ in elections, for example, yet when things are close we perform ‘recounts’ which we fully expect can produce slightly different numbers than the original count.

That means that vote counting is actually vote estimating, and recounting is just estimating with a tighter error bound.

I kind of think the mythology of the ‘countless stones’ (https://en.wikipedia.org/wiki/Countless_stones) is a sort of folk-reminder that you can never be too certain that you counted something right. Even something as big and solid and static as a standing stone.

The situations where counting is not estimating are limited to the mathematical, where you can assure yourself of exhaustively never missing any item or ever mistaking one thing’s identity for another’s.


> the terms are used interchangeably in most contexts

Counting and estimating are not used interchangeably in most contexts.

> because in the real world all counting methods have some nonzero error rate.

The possibility that the counting process may be defective does not make it an estimation.

> We talk about ‘counting votes’ in elections, for example, yet when things are close we perform ‘recounts’ which we fully expect can produce slightly different numbers than the original count.

We talk about counting votes in elections because votes are counted. The fact that the process isn't perfect is a defect; this does not make it estimation.

> That means that vote counting is actually vote estimating, and recounting is just estimating with a tighter error bound.

No. Exit polling is estimation. Vote counting is counting. Vote recounting is also counting, and does not necessarily impose a tighter error bound, nor necessarily derive a different number.

> The situations where counting is not estimating are limited to the mathematical, where you can assure yourself of exhaustively never missing any item or ever mistaking one thing’s identity for another’s.

So like, computers? Regardless, this is wrong. Estimating something and counting it are not the same thing. Estimation has uncertainty, counting may have error.

This is like saying addition estimates a sum because you might get it wrong. It's just not true.


So, IEEE floating point doesn’t support ‘addition’ then.


IEEE 754 defines an exact binary result for the addition of any two floats.

That this bit-identical result is not the same operation as addition of real numbers is irrelevant, because floats aren't reals.

f1 + f2 is not an estimation. Even treating it as an approximation will get you into trouble. It's not that either, it's a floating-point result, and algorithms making heavy use of floating point had better understand precisely what f1 + f2 is going to give you if they want to obtain maximum precision and accuracy.


Cool, so next time I have numbers that aren't reals to perform math on, I can use floats.


Or if you have numbers that aren't integers to perform math on, you can use integers.

It's not a new problem, and it isn't specific to floats. Computers do discrete math. Always have, always will.


Come on. There is a fundamental difference between trying to get an exactly answer and not trying to get an exactly correct answer.


It’s not a fundamental difference, it’s a fundamental constraint.

There are circumstances - and in real life those circumstances are very common - where you must accept that getting an exactly correct answer is not realistic. Yet nonetheless you want to ‘count’ things anyway.

We still call procedures for counting things under those circumstances ‘counting’.

The constraints on this problem (insufficient memory to remember all the unique items you encounter) are one such situation where even computerized counting isn’t going to be exact.


I agree with you, but we are talking theory here. The algorithm doesn't count, it estimates.

You can make an algorithm that counts, you can make an algorithm that estimates, this is the second.


Estimation is counting with error bars.

Frankly, most of what you consider counting in your comment needs error bars - ask anyone who operated an all-cash cash-register how frequently end-of-day reconciliation didn't match the actual cash in the drawer (to the nearest dollar.)

The following is a list from my personal experience - of presumably precisely countable things that didn't turn out to be the case: the number of computers owned by an fairly large regional business, the number of (virtual) servers operated by a moderately sized team, the number of batteries sold in a financial year by a battery company.


Counting is a subset of estimation, not a synonym.

If I estimated the number of quarters in a stack by weighing them, that would be different from estimating the number of quaters in a stack by counting them. Both methods of estimation have error bars.

The list you provide is of categories that don't have clear definitions. If you have a sufficiently clear definition for a category given your population, it has a precise count (though your counting methodologies will still be estimates.) If your definition is too fuzzy, then you don't actually have a countable set.


It's close enough to counting for the purposes of a magazine article like uuids are close enough to being unique for the purposes of programming.


The algorithm accuracy scales with the ratio of memory to set size so you don't actually know if it is "close enough" without an estimate of of the set size.

I think the headline is clickbaity and the article makes no effort to justify it's misuse of the wors 'counting'. The subheadline is far more accurate and doesn't use that many more words.


I think I get your point completely, yet I'm not getting through.

Would you agree that 1+1=2? Or that pi is 3.14159...? These are mathematical truths, but quickly crumble in the real world. One apple plus one apple doesn't just equate to double the apple, no two apples are ever the same to begin with, there are no real perfect circles out there either, there is still value to those mathematical truths in that they make it evident that they are perfectly precise and that it is real world interaction which may bring error into the table.


Counting and estimation are different by definition. One is a full enumeration, the latter an extrapolation from sampled data. In both cases 'accuracy' is a factor. Even if we are counting the number of stars, it is still a difference of technique compared to estimating the number if stars.

I could try to count fibers in muscle or grains of sand in the beach, chances are accuracy would be low. One can be smart about technique for more accurate counts, eg: get 10M sand counters and give them each 1kg of sand which they then count the grains with tweezer and microscope. That is counting. At the same time, we could find an average count of grains in 1kg sand from a random 100 of our counters, and then estimate what an expected total would be. The estimate can be used to confirm the accuracy of the counts.


They are not really as far apart a you think. At small numbers, yes get are distinct. At large enough numbers, they in all practicality the same thing. E.g what’s the population of the US


And by that definition this is a counting algorithm.


True - for (relatively) small numbers. For large (huge) numbers estimation is usually considered to be equivalent to counting, and the result is sometimes represented using the "scientific" notation (i.e. "floating-point") rather than as an integer. For example, the mole is an integer whose value is only known approximately (and no one cares about the exact value anyway).


As of May 2019, the mole has an exact value, and Carbon-12's molar mass is the empirically-determined value.


This doesn't justify estimation to be equivalent to counting even if some mathematicians consider them to be the same. Floating points are for estimation. Integers are for counting. The two are not the same, not even for large numbers.


"Equivalent" and "the same" are sometimes equivalent. (Or the same.)

It depends on what the meaning of the word 'is' is.

https://libquotes.com/bill-clinton/quote/lby0h7o


It's an approximation, not an estimation.


Actually, my understanding is that it is an estimation because in the given context we don't know or cannot compute the true answer due to some kind of constraint (here memory or the size of |X|). An approximation is when we use a simplified or rounded version of an exact number that we actually know.


Wikipedia is on your side:

"In mathematics, approximation describes the process of finding estimates in the form of upper or lower bounds for a quantity that cannot readily be evaluated precisely"

This process doesn't use upper and lower bounds.

However, it still seems more like approximation than estimation to me because of this:

“Of course,” Variyam said, “if the [memory] is so big that it fits all the words, then we can get 100% accuracy.

It seems that in estimation the answer should be unknowable without additional information, whereas in this case it's just a matter of resolution or granularity because of the memory size.

Anyhoo ...

EDIT: also the paper says "estimate" and the article says both "approximate" and "estimate" at different times so it seems everyone except me thinks it's either an estimation or that estimation and approximation are interchangeable.


Still very different things, no?


It's the same thing at different degrees of accuracy. The goal is the same.


Still, counting things and counting unique things are two different procedures.


For someone who's pretty well-versed in English, but not a math-oriented computer scientist, this seems like a distinction without a difference. Please remedy my ignorance.


My GP was wrong, but the words are different.

Eatimation is a procedure the generates an estimate, which is a kind of approximation, while approximation is a result value. They are different "types", as a computer scientist would say. An approximation is any value that is justifiably considered to be nearly exact. ("prox" means "near". See also "proximate" and "proxy".)

Estimation is one way to generate an approximation. An estimate is a subtype of an approximation. There are non-estimation ways to generate an approximation. For example, take an exact value and round it to the nearest multiple of 100. That generates an approximation, but does not use estimation.


I’m not sure the linguistic differences here are as cut and dried as you would like them to be. Estimate and approximate are both verbs, so you can derive nouns from them both for the process of doing the thing, and for the thing that results from such a process.

Estimation is the process of estimating. It produces an estimate.

Approximation is the process of approximating. It produces an approximation.

You can also derive adjectives from the verbs as well.

An estimate is an estimated value.

An approximation is an approximate value.

But you’re right that the ‘approximate’ terms make claims about the result - that it is in some way near to the correct value - while the ‘estimate’ derived terms all make a claim about the process that produced the result (ie that it was based on data that is known to be incomplete, uncertain, or approximate)


The authors of the article disagree with you.


The authors of the paper disagree with me, the authors of the article don't (they use both approximate and estimate, but the paper does say estimate).


From the parent's description it sounds like it just parses log files, doesn't seem like it would augment the behavior of the app generating the logs at all. Curious to know more about why something like that isn't possible on ipadOS though.


In this specific case it's because the log file (if the iPad version of MtGA even generates a "file" for its logs as such) exists in a location only accessible to the app, not to the user or other apps. And the iPadOS model means that there is no way to reach into that app's data to convince it to cough up the data you want.


His first restaurant was called 'Momofuku Noodle Bar', named as an homage to Ando for inventing instant noodles/ramen. I don't think there were many Americans in 2004 (or today) who have even heard of Ando, I don't think he was trying to trade on his name.


Military contractors don't run down to radioshack when they're low on transistors.


In my previous life as an F/A-18 avionics technician I ordered from Mouser and Newark more than a few times when supply didn’t have the parts.


Sometimes they ransack ebay for specific complex chips though...


They would if Radio Shack carried such stuff anymore


How is that comparable to a pacemaker in any way? They aren't made for a similar scale, price, or 'environment', and would only be installed/serviced/dealt with at all by anyone aside from highly trained specialists.


Intentions aside, I read it as a very narrow comparison of the relative durability - I'm guessing they weren't trying to devalue pacemaker engineering.


It sounds like the bluetooth functionality is only there for telemetry.


It's not. It's a full diagnostic interface. Someone with the right software and my serial number could reconfigure it from across the room.

BLE replaces the previous diagnostic interface, which was some form of near-field. You had to have a puck resting within a few inches, going to a several decade old toughbook. My device supports both. It's just in the last couple years that UCLA got the BLE equipment, and sometimes a doctor will whip out the old gear if they feel more confident with it.

When I had the pacemaker first implanted, there was a reliability problem they had to do a second operation to fix it. The pacemaker failed to "capture" my ventricle a few times when it should have. It turned out to be a loose lead connection, but the device's impedance diagnostics didn't make the issue immediately obvious. My overall case was weird enough that UCLA did a case study about it, so for the revision procedure they had a vendor rep in the room to help out just in case. She was holding a tablet and pushing buttons that would make my heart temporarily stop.

Now my AV nerves mostly work again, so the pacemaker can't stop my heart if it wanted to. It can only increase my heart rate, and report unusual patterns to my doctor. Also, if someone did somehow mess with it, holding a strong magnet near it will force it into safe mode.


That's fascinating, and very unfortunate how lax the security likely is for an organ keeping you alive.

You would think if you can detect a strong magnet, you could use that to turn the wireless on and off... Like how holding a power button on a phone turns it off, but holding longer can do a factory reset or what have you.

Glad you're doing better since then, though.


An interesting thought would be to have a nano-lead down the arterials to the wrist, where an external telemetry relay-watch could read the signals, and have the BLE device top dermal. (apple watch)

eliminating RF/BLE bullshit from talking to the pacemaker.


Being pragmatic, that sounds way worse than just having a little ceramic 2.4ghz antenna and some extra silicon potted into the device!


https://www.dailymail.co.uk/news/article-2379009/Barnaby-Jac...

-

Oops - I didnt realize you were same poster from other comment


I don't think my particular pacemaker has the necessary circuitry to generate more than 5V, in pulses less than a few milliseconds. The voltage doesn't really matter much to the muscle.

If you got in you could probably put the leads into single-ended mode (so that there's more current path to cause mayhem) and pace my atrium and ventricles at 210bpm, and effectively give me a seizure. I can't imagine it would kill me before an EMS arrived with a magnet?

Perhaps a more nuanced attack would be to somehow use all the configuration parameters to intentionally bias the pulses so that there's net charge going into the muscle. Over a long time that would cause tissue damage.

If someone wanted to kill me overtly, a gun would be less work. A pacemaker malfunction that bad would be thoroughly investigated, and would be fixed in new devices within a year or two.


If they were able to cause the pacemaker to fire when they wanted they could time it during the repolarization, which could possibly cause a fatal arrhythmia even in a heart that doesn't need a pacemaker. It's called R-on-T phenomenon and it's usually caused by malfunctioning pacemakers.


I doubt you could do that through configuration changes alone, simply because of how defensive the firmware would be about that exact scenario. You'd probably have to resort to code exploits on top of simply gaining access. Even then, there's probably a rudimentary interlock at the silicon level.


The crazy thing was that this was when there was a lot of talk about Dick Cheney and how he was vuln to this attack -- and there was a lot of spec around if barnaby was silenced because it was the older, Cheney-esque politicians that could be taken out by this vector...

Perhaps, he got the 'reverse bounty' on this bug...


How would you 'remove' SEO bias?


You apply an arbitrary filter that deranks 'shitty' sites, and then you play a semantic game about what 'bias' is, and how you don't have it, but all the competing search engines do.

There's no objective measure of what site is 'good', and what site has been 'SEO-gamed' to rise to the top of <arbitrary search engine's algorithm>. The measures are subjective, and people complain when changes to the algorithm aimed at punishing black-hat-SEO also punish their website (Rightly or wrongly).

Not to mention the problem of using black-hat-SEO to punish a competitor (By creating scummy links to them, that make it look like they are trying to game the search engine.)


Growing up, my mother always used to give me books that contained content on page one. This goggle really brings back memories of wandering the dusty library isles of Mediterranean street markets with their pungent spices and beautifully crafted rugs.

... 100,000 words later ...

I don't think it will be very hard get right.


I personally ? would remove SEO-bias by first exposing the actual construction of the site without SEO add-ons, to some engine for consideration. A simple example would be "built with wordpress". Please note that it is wise to know that I do not know, many many things. So it becomes a networked endeavor, to find and identify "literal attributes" to sites for the use of the engine, not my personal opinion of what the web-o-sphere is in 2022.

A very very significant example is the primary language group exposed on the site.. for example, Cantonese ? I personally support the rights of "minority" languages like Gaelic and Welsh in the European setting, Tamil for South Asia, things like that.. make it so..


I agree, the ability to reliably filter out overly-SEO sites, copy-spam, and other search engine-bait would be a true game changer! Defining and detecting it is of course the hard part.


My point is that as long as there is an algorithm, it will be possible to optimize for it.


The advantage that Brave and co have is that SEO sites will largely not be willing to make changes that decrease their Google rank to improve their Brave rank. If Google changes their algorithm, everyone will optimise for the new one, but as long as the Google algorithm is there and the main priority, Brave may be able to extract some useful signal.


A good portion of the largest insurance companies are non-profits, look at Blue Cross Blue Shield and their affiliated companies. They still make tons of money, keep tons of cash on hand, enjoy the same high salaries as for-profits (not saying they necessarily shouldn't), and get special tax statuses/breaks.

And the for profit ones are making plenty of money whatever regulations theyre subject to:

>During 2010, Health Care Service Corporation, the parent company of BCBS in Texas, Oklahoma, New Mexico, Montana and Illinois, nearly doubled its income to $1.09 billion in 2010, and began four years of billion-dollar profits.

I'm not saying they're villains, but "they're going to make money no matter what" isn't a compelling argument to me, and I have precisely 0 faith in the government to meaningfully regulate them.


You don't seem to understand the difference between a design choice you don't like, versus an objectively bad design.

When Apple removed the headphone jack from iPhones, (supposedly) in favor of making them waterproof, that was a design choice that many people rightfully disliked. However, we can imagine there was also a population of people who preferred such a tradeoff. For instance: lifeguards, sea-world trainers, deaf people. It's not a 'flaw' simply because you don't prefer it.

If you are someone who buys a diving watch, but isn't a deep sea diver, you aren't justified in complaining that the watch isn't solar powered. There are people who require (or at least prefer) such a watch, and if you aren't among them, don't buy it.

You don't get to decide what is and isn't a flaw based on your arbitrary use cases. An actual flaw is an implementation detail that is not consistent with the design specs. If Apple set out to make a phone with a working headphone jack, and the resulting product didn't have one, thats a flaw. If they set out to make a phone without a headphone jack, then it isn't a design flaw if the resulting phone doesn't have one.


You don't seem to understand that there is no arbiter of objectivity that we can rely on and that deciding what is a flawed design and what is not is completely subjective.

Removing the headphone jack "for waterproofing" is obviously defective for me because other phones that are waterproof didn't have to remove the jack and I had to buy new headphones. It wasn't for Apple though - they designed it that way so they could sell more headphones. That's how subjectivity works.

Who complained about deep sea watches not being solar powered? I don't have to make up situations to prove my point like a weasel. I just point them out as I see them and the examples I've referenced have been widely held as flaws in Apple's designs.

I do get to decide what is a flaw based on intended use cases as stated by Apple. Nobody made up any arbitrary use cases here.


> i would be pretty irritated if someone ripped off my idea after working with me

Are you sure he really 'ripped off' anyone in the first place? The first thing I thought of when I read this article was 'that sure seems like the same thing Jupyter was made for'. A quick google shows that Jupyter was a couple years ahead of Repl.it, and I doubt that Jupyter was the first to come up with a web app REPL shell in the first place.


dude literally emailed the ceo and was like yeah i used this component, but what else could i have used, and yeah i used this other thing i learned about from working there but its popular! and i used yet another thing but you guys werent using it when i worked there.

im not saying any of this is illegal. just weird to copy your previous employer's tech stack, open source it, and try to play the victim and clickbait HN.


Consider applying for YC's first-ever Fall batch! Applications are open till Aug 27.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: