More

teddyknox · on Nov 13, 2018

I wonder if machine learning could be used to model the probability distributions at a greater capacity, and thus reduce backtracking.

One might also consider placing the algorithm in the context of a generative adversarial network (GAN) to adapt the tile probability modeling ML component towards a pattern that is less distinguishable from a real city.

teddyknox · on Oct 3, 2018

How'd you learn that you have this condition?

teddyknox · on Sept 4, 2018

Did vorinostat do anything for this? One of your comments on this 6 months ago got indexed by a search engine and I just found it.

teddyknox · on June 15, 2017

Love the modesty implicit in the short post.

teddyknox · on June 8, 2017

Death metal is my go-to work music for when the fight with entropy feels like a losing battle. Especially as the rest of our capitalist society tells you to be happy all the time, death metal offers you the intuitive sense that you're not alone in the pain, and some solace in the fact that some people are brave enough to lament their difficulty out loud.

I think catharsis is really important and people should teach themselves to seek it the way they seek their professional goals, food and companionship.

teddyknox · on May 20, 2017

Why do BBC articles do so well here?

teddyknox · on April 27, 2017

Ello.co stealing their metaphorical lunch

teddyknox · on April 25, 2017

Sometimes I use a 26 word language and it sure seem capable of saying everything...

schoen · on April 25, 2017

Maybe that's a 26-character writing system?

pmiller2 · on April 25, 2017

Maybe it's finger spelling. :)

jacob019 · on April 25, 2017

I'm more comfortable using just two words.

teddyknox · on April 20, 2017

Conventional processors will eventually run into physical limits of heat dissipation that will impede moore's law. Reversible computers are a way of circumventing this physical limitation because they dissipate less energy as heat.

teddyknox · on April 19, 2017

When I think exabyte scale queries on a columnar datastore I think aggregations, but then I have this question: Why do we need to do exabyte scale queries in the first place? Wouldn't statistical inference via random sampling be faster and accurate enough?

(Granted, often times aggregations are happening after some filtering, at which point the relation being aggregated might be considerably smaller than exabyte scale.)

adwf · on April 19, 2017

Redshift is designed to fill the classic accounting datawarehouse role in an organisation. Whilst I'm sure there aren't too many companies with account ledgers that large (or any), I doubt too many accountants would be happy with statistical inference of their books... ;)

This new model of processing directly on S3 is pretty much aimed specifically at eliminating the "Load" part of the ETL process. Just dump to csv from whatever sources you originally had, and don't worry about the schema conversion/loading into a DB. The fact that it happens to scale to exabytes is just good marketing fluff.

teraflop · on April 19, 2017

Yeah, I think filtering is a big part of it. If you want to answer a statistical question about the entire dataset, then a random sample is probably good enough. If you want to drill down and do an analysis that only looks at a particular narrow slice of the data, then it's likely that the corresponding subset of your sample isn't big enough to be meaningful.

(You can pre-filter or pre-aggregate before sampling, but that assumes you know a priori what types of queries you'll want to do.)

awgupta · on April 20, 2017

it really depends on what you are doing. A large data set shouldn't be limited to longitudinal analysis. If you're storing every log record or every stock bid/ask, there may be times that you need to understand the specifics of what exactly was going on. There may be a lot of filtering on the underlying corpus for these sorts of exact match queries, but data set sizes continue to grow.

awgupta · on April 20, 2017

that said, I agree that approximate functions should be part of a modern database system. Redshift has approximate count distinct (based on hyperloglog) and approximate percentiles (based on quantile summaries)

sixdimensional · on April 20, 2017

See BlinkDb