> Hi, I am AI and Blockchain-Powered SummarizeBot!
I assume the expected reaction is "Cool! They use blockchain." My reaction is moreso, "What on earth does this have to do with blockchain?"
> We apply decentralized architecture to train and test our AI models. Using blockchain technology helps us not only to get more training data but also to improve the trustworthiness of our algorithms.
I still don't understand what this has to do with blockchain.
True story. I met a guy who was looking for a tech co-founder and had an idea that will use blockchain. I challenged him on the reason behind the use of blockchain and he literally said "Investors like to hear buzzwords and at this time, I am looking to raise funding". His idea wasn't bad but he couldn't justify using blockchain to implement it. I pressed him further and asked if he was willing to implement the idea without blockchain and he just wasn't interested. he said "blockchain is the hot thing right now and I want to build this using blockchain". So in other words, it was more important for him that blockchain was used than focussing on the actual idea.
Honestly, that's hysterical even for HBO's Silicon Valley. As someone who has never lived in or experienced Silicon Valley, I always assumed the show was a bit of an exaggeration. But people like this are proof that it's not. It's a documentary with a comedic element.
Assume not, many of the moments in the show have been lived by myself and others. More so - I think the reality is that the truth goes even further than the fictional works we see.
Yes. A friend works in a company where management started asking devs if they could use blockchain somewhere in the existing product (that has absolutely zero need for it), because they believed that being able to say they use "blockchain" will make it easier to get funding.
Blockchain is everywhere now, it's even built in to MS-Office with this service https://www.microsoft.com/developerblog/2017/04/10/stampery-... sowhy not something like a summarizer as well.I am just waiting for the office admin people to ask me to verify my documents with this.
Possibly. There's absolutely nothing wrong with building something with new tech just for fun, but when you're trying to build a business of course you have to think about whether it actually adds value.
1. Summarizing entire bodies of text down into "bite-sized" chunks isn't inherently a good thing. It seems the main use case (and at least the one suggested in the demo) is to be used for news articles. Now, I'm totally understanding of the fact that not everyone has the time to read every news article, but as it is, only reading part of the article (or more commonly, only reading the headline) is a huge issue with current consumption of content. This attempt to further summarize articles into small, context-less bites seems to be going in the wrong direction.
2. On the demo page, there is a "Fake News Detection" feature. I threw a couple of articles at it and it left me with so many questions I don't even know where to begin. For a few articles, it just gave me a binary "Real:1 , Fake:0" output. For others, it spit out a couple of numbers for stats like "conspiracy", "irony", "bias", "pseudoscience". Why are these the attributes chosen to measure? How are they calculated? Is something like "irony" even meaningful when trying to detect fake news?
Viewing the documentation section of the site, there is a small blurb claiming that it uses "custom AI classifiers", "custom machine learning models trained on fake and biased articles", and "database of trusted and biased websites created by our experts" to calculate these numbers. AKA, there is absolutely zero meaningful explanation as to how these numbers are calculated and why they should be trusted. This entire feature is a complete black box, and for all we know, the "database of trusted websites" could be created by Russian spies trying to sow misinformation.
Here's a summarization I'd like (and would pay for[0]): chats. I mean IRC logs, Slack & Telegram groupchat logs, etc. Between work and local Hackerspace, people produce so much text on IM that I can't really keep up with it. I'd love a solution that I could feed such chat logs, and get back e.g. list of topics covered.
--
[0] - as long as it wasn't a cloud SaaS where I have to share my data with vendor's machines.
Totally agree, I tried to prototype this but the datasets to train ML summarizations from are pretty much all from news articles. Trying to take that model and summaries chats resulted in gibberish for me. The salient take-aways from Chats and other non-factual / loosely structured text seems so dependent on what you care about, summarisation is difficult.
1) I did try on https://www.reuters.com/article/us-southkorea-prisonstay-idU... and from 20% and 40% summary I have no clue of the actual meaning of the article. It seems that it just gets some sentences out, but they don't really combine in summary as a whole.
2) And seems like it has some fake news element as well "fake: 0.343". Confusing.
To sum up my experience:
I confused, real: 0.9; fake: 0.1.
Anyhow it seems that has some great potential and may be useful in some general knowledge fact summarization in the future.
In theory that makes the language models auditable and tamper-proof. I'm not so sure about the supposed benefit of that, though. Yes, it means that the model itself cannot be tampered with (in order to introduce bias to the summaries, for instance) but as long as the algorithm itself remains closed source you could still alter the results by for example boosting some values while attributing less significance to others.
Simply publishing both the algorithm and the model as open source alongside with an SHA-2 hash to make sure neither has been tampered with would achieve a lot more in terms of reproducibility and trustworthiness.
Then again, they would've had one buzzword less in that case ...
Right but you could also commit your changes to your model into a Git repo and use the "blockchain" that Git provides.
When people say blockchain the meaning that there is a distributed consensus comes into the picture. In this case, there is no reason for a distributed consensus on ordering or anything.
But if you are suggesting there are many text parsers that train the model, and there is a central modal that's held by the network state, sure. But I don't know what's the benefit to that as I don't think simply training on more text will allow this bot to produce better summaries.
As others have mentioned, I have no idea what any of this has to do with blockchain, or how it could even conceivably help with anything other than riding a hype train. That being said, would be curious to see how it holds up against the standard state of the art (e.g. https://github.com/sebastianruder/NLP-progress/blob/master/e...) as summarization (especially the abstractive kind) is very hard, but also has a surprising amount of useful applications
What kind of a person writes enough that it becomes useful to summarize what they write, but is incapable of summarizing what they have written so that a summarizebot becomes useful to them?
It probably would be more helpful on the pull side rather than the push side of the channel. Useful for consumption of content in certain situations by certain segments of readers.
Say, someone scanning through a list of legal docs to identify the most relevant ones. A short blurb would be pretty helpful.
I would fin that useful, coming from a background writing somewhat abstruse mathematics and now having to write for ordinary people. Summarizer tools seem to offer a useful productivity tool.
Idk about the blockchain stuff, but it is semi-featureful since it can extract text from images (I'm assuming its just using Tesseract OCR. At least it could get the text out of this http://www.antigrain.com/research/font_rasterization/msword_...), and I assume audio (but I am having difficulty testing it out). Tbh the formats it accepts as well as the fact that it is available over messenger are huge "selling" points for me and should be way more emphasized on the product page than blockchain and AI.
It would be really cool to be able to translate via ML high level AI progress into standard American journalistic english. To some extent Bloomberg TicToc, Jinri Toutiao are already generating short form video for breaking news stories.
I'd be curious to learn more about the use case for blockchain here.
For instance, Ocean Protocol plans to use TCRs for data quality, meaning that data providers and data consumers evaluate the quality of a dataset in a continuous way, so that certain assets can be moved up/ down the ranks in near real time.
I assume the expected reaction is "Cool! They use blockchain." My reaction is moreso, "What on earth does this have to do with blockchain?"
> We apply decentralized architecture to train and test our AI models. Using blockchain technology helps us not only to get more training data but also to improve the trustworthiness of our algorithms.
I still don't understand what this has to do with blockchain.