Show HN: Extract main ideas in your texts with SummarizeBot

michaelmior · on Nov 27, 2018

> Hi, I am AI and Blockchain-Powered SummarizeBot!

I assume the expected reaction is "Cool! They use blockchain." My reaction is moreso, "What on earth does this have to do with blockchain?"

> We apply decentralized architecture to train and test our AI models. Using blockchain technology helps us not only to get more training data but also to improve the trustworthiness of our algorithms.

I still don't understand what this has to do with blockchain.

libdjml · on Nov 27, 2018

> I still don't understand what this has to do with blockchain.

Funding from VCs?

codegeek · on Nov 28, 2018

True story. I met a guy who was looking for a tech co-founder and had an idea that will use blockchain. I challenged him on the reason behind the use of blockchain and he literally said "Investors like to hear buzzwords and at this time, I am looking to raise funding". His idea wasn't bad but he couldn't justify using blockchain to implement it. I pressed him further and asked if he was willing to implement the idea without blockchain and he just wasn't interested. he said "blockchain is the hot thing right now and I want to build this using blockchain". So in other words, it was more important for him that blockchain was used than focussing on the actual idea.

adtac · on Nov 28, 2018

Honestly, that's hysterical even for HBO's Silicon Valley. As someone who has never lived in or experienced Silicon Valley, I always assumed the show was a bit of an exaggeration. But people like this are proof that it's not. It's a documentary with a comedic element.

miketery · on Nov 28, 2018

Assume not, many of the moments in the show have been lived by myself and others. More so - I think the reality is that the truth goes even further than the fictional works we see.

emteycz · on Nov 28, 2018

I know people like that. They grew up from the students who did everything exactly like the teacher said, even if there was obvious (honest) mistake.

rajacombinator · on Nov 28, 2018

Did he get the funding though?

TeMPOraL · on Nov 28, 2018

Yes. A friend works in a company where management started asking devs if they could use blockchain somewhere in the existing product (that has absolutely zero need for it), because they believed that being able to say they use "blockchain" will make it easier to get funding.

js4 · on Nov 28, 2018

rmbeard · on Nov 28, 2018

Blockchain is everywhere now, it's even built in to MS-Office with this service https://www.microsoft.com/developerblog/2017/04/10/stampery-... sowhy not something like a summarizer as well.I am just waiting for the office admin people to ask me to verify my documents with this.

yashap · on Nov 27, 2018

Most likely a dev wanting to play with cool new tech for not-very-good reasons, and no adult supervision preventing this.

michaelmior · on Nov 28, 2018

Possibly. There's absolutely nothing wrong with building something with new tech just for fun, but when you're trying to build a business of course you have to think about whether it actually adds value.

txcwpalpha · on Nov 28, 2018

A couple thoughts:

1. Summarizing entire bodies of text down into "bite-sized" chunks isn't inherently a good thing. It seems the main use case (and at least the one suggested in the demo) is to be used for news articles. Now, I'm totally understanding of the fact that not everyone has the time to read every news article, but as it is, only reading part of the article (or more commonly, only reading the headline) is a huge issue with current consumption of content. This attempt to further summarize articles into small, context-less bites seems to be going in the wrong direction.

2. On the demo page, there is a "Fake News Detection" feature. I threw a couple of articles at it and it left me with so many questions I don't even know where to begin. For a few articles, it just gave me a binary "Real:1 , Fake:0" output. For others, it spit out a couple of numbers for stats like "conspiracy", "irony", "bias", "pseudoscience". Why are these the attributes chosen to measure? How are they calculated? Is something like "irony" even meaningful when trying to detect fake news?

Viewing the documentation section of the site, there is a small blurb claiming that it uses "custom AI classifiers", "custom machine learning models trained on fake and biased articles", and "database of trusted and biased websites created by our experts" to calculate these numbers. AKA, there is absolutely zero meaningful explanation as to how these numbers are calculated and why they should be trusted. This entire feature is a complete black box, and for all we know, the "database of trusted websites" could be created by Russian spies trying to sow misinformation.

TeMPOraL · on Nov 28, 2018

Here's a summarization I'd like (and would pay for[0]): chats. I mean IRC logs, Slack & Telegram groupchat logs, etc. Between work and local Hackerspace, people produce so much text on IM that I can't really keep up with it. I'd love a solution that I could feed such chat logs, and get back e.g. list of topics covered.

--

[0] - as long as it wasn't a cloud SaaS where I have to share my data with vendor's machines.

charris0 · on Nov 28, 2018

Totally agree, I tried to prototype this but the datasets to train ML summarizations from are pretty much all from news articles. Trying to take that model and summaries chats resulted in gibberish for me. The salient take-aways from Chats and other non-factual / loosely structured text seems so dependent on what you care about, summarisation is difficult.

lfx · on Nov 28, 2018

Same thoughts:

1) I did try on https://www.reuters.com/article/us-southkorea-prisonstay-idU... and from 20% and 40% summary I have no clue of the actual meaning of the article. It seems that it just gets some sentences out, but they don't really combine in summary as a whole.

2) And seems like it has some fake news element as well "fake: 0.343". Confusing.

To sum up my experience: I confused, real: 0.9; fake: 0.1.

Anyhow it seems that has some great potential and may be useful in some general knowledge fact summarization in the future.

etaioinshrdlu · on Nov 27, 2018

Is it abstractive or extractive?

What exactly does this have to do with blockchain?

The landing page is more marketing than technical and may not really be a good fit for this site.

BjoernKW · on Nov 27, 2018

From their "Technologies We Use" page ( https://www.summarizebot.com/summarization_business.html#tec... ) I gather that they use a Blockchain for storing their training data and language models.

In theory that makes the language models auditable and tamper-proof. I'm not so sure about the supposed benefit of that, though. Yes, it means that the model itself cannot be tampered with (in order to introduce bias to the summaries, for instance) but as long as the algorithm itself remains closed source you could still alter the results by for example boosting some values while attributing less significance to others.

Simply publishing both the algorithm and the model as open source alongside with an SHA-2 hash to make sure neither has been tampered with would achieve a lot more in terms of reproducibility and trustworthiness.

Then again, they would've had one buzzword less in that case ...

therein · on Nov 28, 2018

Right but you could also commit your changes to your model into a Git repo and use the "blockchain" that Git provides.

When people say blockchain the meaning that there is a distributed consensus comes into the picture. In this case, there is no reason for a distributed consensus on ordering or anything.

But if you are suggesting there are many text parsers that train the model, and there is a central modal that's held by the network state, sure. But I don't know what's the benefit to that as I don't think simply training on more text will allow this bot to produce better summaries.

miketery · on Nov 28, 2018

My heuristic here is that if they don't say abstractive then it must be extractive. Simply because the former is so much more difficult.

anon1253 · on Nov 27, 2018

As others have mentioned, I have no idea what any of this has to do with blockchain, or how it could even conceivably help with anything other than riding a hype train. That being said, would be curious to see how it holds up against the standard state of the art (e.g. https://github.com/sebastianruder/NLP-progress/blob/master/e...) as summarization (especially the abstractive kind) is very hard, but also has a surprising amount of useful applications

diegoperini · on Nov 28, 2018

Is Show HN becoming more and more a "roast my buzzword aggregator project" tool or is it my selection bias?

wj · on Nov 28, 2018

I guess it is like that software that Yahoo bought from a 15 year-old that summarized articles:

https://gizmodo.com/yahoo-shutters-that-30-million-app-it-bo...

melicerte · on Nov 28, 2018

This page. Summarized. For your consideration.

https://www.summarizebot.com/api/378d4eec8d0e4ddeb8142c84433...

bryanrasmussen · on Nov 28, 2018

What kind of a person writes enough that it becomes useful to summarize what they write, but is incapable of summarizing what they have written so that a summarizebot becomes useful to them?

proxygeek · on Nov 28, 2018

It probably would be more helpful on the pull side rather than the push side of the channel. Useful for consumption of content in certain situations by certain segments of readers.

Say, someone scanning through a list of legal docs to identify the most relevant ones. A short blurb would be pretty helpful.

bryanrasmussen · on Nov 28, 2018

sure, I was obliquely referring to the headline - Extract main ideas in YOUR texts... - seemed to prompt the question.

on edit: fixed misspelling, just woke up from nap.

rmbeard · on Nov 28, 2018

I would fin that useful, coming from a background writing somewhat abstruse mathematics and now having to write for ordinary people. Summarizer tools seem to offer a useful productivity tool.

defertoreptar · on Nov 28, 2018

It's great for long, rambling forum posts that keep hitting on the same points, and you want to find out if it's worth the time reading it.

amelius · on Nov 27, 2018

Are there any benchmarks for this kind of task, and how well does this tool perform w.r.t. them?

team-o · on Nov 28, 2018

In terms of measurement, there's ROUGE https://en.wikipedia.org/wiki/ROUGE_(metric) In this paper they discuss a few benchmarks in their introduction and describe a new benchmark. https://arxiv.org/pdf/1808.08745.pdf

abhisuri97 · on Nov 28, 2018

Idk about the blockchain stuff, but it is semi-featureful since it can extract text from images (I'm assuming its just using Tesseract OCR. At least it could get the text out of this http://www.antigrain.com/research/font_rasterization/msword_...), and I assume audio (but I am having difficulty testing it out). Tbh the formats it accepts as well as the fact that it is available over messenger are huge "selling" points for me and should be way more emphasized on the product page than blockchain and AI.

ArtWomb · on Nov 28, 2018

The end goal is wider distribution of content. Here, for example, are summaries (json) of all 1008 academic papers accepted to NeurIPS 2018:

https://github.com/contentinnovation/NeurIPS-2018-papers

It would be really cool to be able to translate via ML high level AI progress into standard American journalistic english. To some extent Bloomberg TicToc, Jinri Toutiao are already generating short form video for breaking news stories.

anotheryou · on Nov 28, 2018

Why not have a paste text field for a quick demo on the landing page? Best pre-filled (some cherry picking is ok :) ).

decentralised · on Nov 28, 2018

I'd be curious to learn more about the use case for blockchain here.

For instance, Ocean Protocol plans to use TCRs for data quality, meaning that data providers and data consumers evaluate the quality of a dataset in a continuous way, so that certain assets can be moved up/ down the ranks in near real time.

pwaai · on Nov 28, 2018

[flagged]

dang · on Nov 28, 2018

Could you please stop posting unsubstantive comments to Hacker News? We've already asked you this.

pwaai · on Nov 28, 2018

Somebody downvoted you it seems. I think somebody maybe downvoting all my comments

Edit: seriously? Who hurt you

pwaai · on Nov 28, 2018

Ok....

dominotw · on Nov 27, 2018

can i see an example