I was really blown away by the results you achieved. Amazing work! My jaw hit the floor when I saw the witty farewell "fun guy" quip, and I was in stitches when I read the song about baking. I look forward to the day I can take the model for a spin - unfortunately I don't have the requisite $18,000 hardware ;)
I have a few questions: Could this be used as a tool to get a feel for public sentiment? For example, could you ask the bot what it thinks about gun control and have it spit out a policy that appeals to the common public? If you ask the bot what it thinks about how a company will perform, how accurately does it predict? I know that the model will contain the biases of the data set, but I'm curious if you've run these types of experiments. What do you think the results would be if you had an even bigger, more diverse corpus? (devil's advocate, for the sake of discussion: perhaps everyone's fb messenger and WhatsApp chat history)
Finally, you have clearly gone to great lengths to make the bot pleasant to interact with. What sort of results to you get when you train such a huge model on an uncurated corpus and don't try to tweak its personality? I find myself wishing that you didn't try to do this as the bot seems to be hyper-agreeable. I. E Too many responses like "You like watching paint dry? That's super interesting! I love watching paint dry!".
I would not encourage using the model for anything other than AI research -- we're still in the early days of dialogue, and there are a lot of unexplored avenues. There are still nuances around safety, controlling generation, consistency, and knowledge involvement. For instance, the bot cannot remember what you said even a few turns ago, due to limitations in memory size.
In the paper, we did explore what happens when you do NOT fine tune it on the specialized tasks (knowledge, empathy and personality). The non-finetuned bot was both less engaging and more toxic. The special finetuning is really important to getting this bot to be as high quality as it is.
But toxicity and quality is subjective. The technical achievement is undeniably brilliant, but the quality of the personality is subject to opinion - as I mentioned, I did not personally enjoy the agreeability of the bot. What's toxic today may not be toxic tomorrow and vice versa.
It's just a matter of time before a model of this size can be run on commodity hardware and somebody will take the brakes off and/or attempt to run experiments that aren't just "can this thing pass the turing test?". I'd be really interested to know the thoughts of the team, given their expert knowledge and experience with the matter.
It's not obvious? Facebook has enough of your conversations to fingerprint your writing style. Now they can de-anonimize your reddit account and serve you even more personalized ads.
That sounds like total overkill. To deanonymize a reddit account, it is easier to just use a add/cookie/script on reddit to link it to your facebook account you have open in another tab.
Ok, I don't know about any reddit details, but I just opened reddit.com without a adblocker and took a look in the network tab in the dev tools.
I mean, I don't see a plain facebook connection, but I see googleads and amazon connection and don't know what else obscure things. I doubt it would be hard to sneak something in, that just checks whether this is the same browser where you just switched over from the facebook tab (where the onblur event just got captured). But again, I am not an expert in tracking ads, nor reddit, I just know the web and its various data transmitting technices quite good.
That's definitely a fair concern. We believe that open science and transparency are the right approach here. By releasing it, we ensure that everyone is on the same page with respect to abilities and defense.
Defending against such an "attack" is much easier if the technology is widely available and many people can play around with it and explore the limits.
At some point you have to let the rest of society in on it. As technology advances, if you keep the power away from the public, eventually you will split into two civilizations a vast divide creating an asymmetry that will be exploited by those on the elite side of the divide. It must be an aspect of technological rollout to figure out how to keep its usage safe once it is widespread.
You are assuming that such tech is not already being (or has been) developed covertly by malicious actors. Developing this and making it open source brings more awareness about the subject and will make it easier to develop defense models against such bots (whether already in existence or that will be developed in future).
> You are assuming that such tech is not already being (or has been) developed covertly by malicious actors.
I fail to see where playing out the "but others are doing it too" card exempts the responsibility of those who either lower or eliminate the barrier to entry to these attacks.
every AI sound/word/picture editor i've ran into says something along the lines of "we're releasing this data set to help stay secure in this day and age of easy counterfeiting of X.", but they never really mention how you apply the data in an adversarial way against itself -- they just sort of hand-wave that part.
Same with fake AI generated Obama video and sound, and earlier data-set generated chatbots; it's plastered all over the projects things like "Since these methods are available we think that it's important that this data is disseminated so that other's can use it to validate real world data sources", but again -- how?
We have the real data, we have the fake data -- how is this diff done, exactly?
I'm willing to bet it isn't as easy as all the AI researchers who release this stuff claim it may be.
If its secret or not publicly available people will argue using Occam’s razor or that only “State actors” could use this. With the subtext being your not important enough.
With the data public its more akin to driveby ssh login attempts. Not being important doesn’t mean your not under attack and people can take the necessary precautions.
That's a bit like saying that nuclear secrets should be made public so that people can "take precautions" because "anyone can have a nuclear weapon, not just state actors".
There are few reasonable ways to "take precautions" against nuclear weapons and there are few reasonable ways to "take precautions" against something like this short of swearing off of social media entirely.
Without reasonable defences, all you really accomplish is ramping up proliferation.
I don’t think weapons of mass destruction is comparable. More like a security vulnerability for the mind. You can no longer be sure its a human on the other side.
Still, unlike Tay, we purposely did not create a service for it and do not advise creating one. This is for research purpose only and more effort needs to be made on safety before it can me more broadly consumed.
One of the tasks that we fine-tuned the model on is ConvAI2 (or "Persona-chat") which specifically aims to improve the model's consistency by conditioning its responses on a given persona. See here: https://arxiv.org/abs/1801.07243. In this research we found that conditioning on a persona improves consistency, but that the models still aren't always perfectly consistent. In particular, the model can only see a few turns back in the dialogue history, and may contradict itself if the conversation is long enough that the model simply does not see the full conversation.
As far as facts go, we also fine-tuned the model on the Wizard of Wikipedia task (https://arxiv.org/abs/1811.01241), which helps improve its knowledgeability. Again, it still isn't perfect.
For most the evaluations you reported at engagingness (expect Figure 16). Did you also look humanness? I would be especially interested how human your chatbot is compared to real humans (Figure 17). This would be similar to a turning test.
The bot is really optimized on engagingness rather than humanness - in particular this is how we chose the hyper parameters. We did evaluate Meena vs. Human (31 vs. 69) and Meena vs. BlenderBot (35 vs. 65). But didn't do BlenderBot vs. Human. Good suggestion though.
> Numerous issues arose during longer conversations. Blender would sometimes respond with offensive language, and at other times it would make up facts altogether.
I mean, to be fair, I've had many conversations like that...
Is it possible to learn sentiment analysis from Reddit? If they had access to modmail they could determine what's offensive to individual subreddits or groups, but I'm not sure if there's a way to gauge fiery reactions without that.
Maybe you could bootstrap it with an existing sentiment analysis tool, but that could easily lead to Garbage In Garbage Out.
Tay went bad because of a different mechanism. In the case of Tay, trolls figured out a way to make it repeat back arbitrary strings, and used that to create seemingly offensive dialogue. In the case of this chatbot, the offensiveness is coming from the underlying training data.
There was a post after Tay came out that argued that Tay's answer to "Is Ted Cruz the Zodiac Killer?" came from the training the data, because that was already a meme, and it came back with the quip within minutes of launch.
There is something similar to Tay on Facebook/Instagram but it's not as "offensive" so the problem with Tay must have been in the dataset or something. It's basically a bot attempting to be a social media influencer.
Facebook is worse. On Reddit you get insulted by an anonymous account with a novelty name. On Facebook you discover how crazy your former friends and relatives can be.
It’s one of the reasons I refuse to open Reddit anymore, 95% of the comments are insults, made up facts, misinformation or people living in their own bubble and refusing to accept any opinion other than their own.
There are some nice communities out there, but they’re usually well hidden and relatively small. Once a community grows, it turns into the same shit show as the rest of Reddit.
Want to learn how to make Kombucha, lock picking, or 3D printing? Then you probably want Reddit.
But anything approaching the popularity of a moderately successful video game turns into a shit show. One of the free to play games I use to play actually had two subreddits, after a community schism. Bonkers.
Did Facebook ask permission to create derivative works (the bot) from Reddit posts, I wonder, or does this fall under web-scraping law?
If I recall Reddit users still retain rights to their posts unless Reddit the company provides some sort off broad grants?
If they did not, this is an interesting example a company potentially making a great deal of money (if the bot is sold as something) from content that legally belongs to users without compensation. It's one thing if it abides by a site user agreement and users understand once they post it's gone, but to see it happen from a Reddit corpus seems odd.
Shorter version: source data has value and users should share in any value derived from their data if they have the rights to it.
Legally, https://towardsdatascience.com/the-most-important-supreme-co... gives a good example how transformational machine learning classifiers generally fall under fair use. It does raise a good point that generative machine learning, like this, has not been explored legally yet.
This is still research which will likely provide public good if/when they publish results and methods. Probably, they'll do a different dataset for any commercial work given the profanity problem highlighted in the article.
Making or not making money is such a weird way for people to see things. That's part of why I love the Free Software movement so much and abhor the CC-*-NC licences.
Fortunately, Reddit has the exception where they can give out access to anyone they want. But I still think StackOverflow is the gold standard: CC-BY-SA. No restriction on making money. Maybe a platinum standard would be CC-BY.
The point is not about the money - the point is using data contributed by users without the proper license to create something that might yield revenue which will then not be shared or payed forward in any way to the contributors. We have all worked hard to create the data used by companies to sell ads to us and make massive amounts of money. I guess I got a couple gigs of free email? Cool...
I also understand that most apps make us sign our lives away, but if I don't (as in the Reddit case) and I actually have rights to the data I sure as heck don't want that data used ANYWAY to power more of this stuff.
Probably a gross overreaction, but it seems like an externality that we've kinda just accepted as society that I'd like to see change a bit.
In Reddit's case, that's the deal. You get a website to share things on with other people, and the value exchange involves you giving full licence to Reddit and giving relicense rights to Reddit.
Personally, I find that a very fair deal and clearly other people do as well. I think it actually yields positive externalities because we get things that wouldn't exist otherwise because the transaction costs outweigh the value, but the transaction costs are an inherent cost and I don't want to levy them. Fortunately, Reddit gives me the ability to not levy them and to guarantee that I won't levy them.
In fact, this is part of the magic of Free Software: true freedom to use. Yes, Google can use so much work which was done and it doesn't have to pay any of it back to Torvalds or Greg Kroah-Hartman or even me for the minor changes I made to libraries. This is freedom. I prefer it. And fortunately the world is aligned in this direction.
I want to agree with you with 100%, but something is nagging at me a bit. Just like free software that ends up in a paid product and then winning or settling in court because the company has more resources to use the judicial system, when we apply this directly as a societal value this starts to break down in practice.
The freedom you are talking about ends up justifying (in practice) a situation that only provides real freedom for a small few that happened to take advantage early and use other asymmetries in society to consolidate control. Sure, we fix those we're all set! (maybe?)
But until then perhaps we can agree that as a society we expect (and might ask for, by law) a little something extra from companies that have benefitted to help ensure others after them have a chance to use this freedom as well.
My argument is not as well thought out at this point, I grant you. Thanks for providing me with a lot to think about.
Yes and no. You're right that there's no button you can push to delete an entire account history, but wrong that there's no way to remove HN comments. We take care of deletion requests for people every day. We don't want anyone to get in trouble from anything they posted to HN, there's nearly always something we can do, and we don't send people away empty-handed. I can only think of one or two cases where we weren't able to make a user happy, and neither of those cases had to do with identifying information being left up on the site.
The reason we don't delete entire account histories wholesale is that it would gut the threads that the account had participated in, which is not fair to the users who replied, nor to readers who are trying to follow discussion. There are unfortunately a lot of ways to abuse deletion as well. Our goal is to find a good balance between the competing concerns, which definitely includes users' needs to be protected from their past posts on the site. I don't want anyone to have the impression that we don't care about that; we spend many hours on it.
I get the reasoning, but I don't see this applied to some platforms. Reddit and Discord allow you to both delete and edit older comments, and there's no limits on how far back you can go (so you can, if you wanted, edit or delete your entire history).
Under the GDPR a subject is allowed full erasure rights. If I say I want you to delete my content from x date to y date, or a particular post, or everything entirely then that shouldn't be an issue. A request may be bothersome, but that's what happens when you don't offer that functionality natively.
I noticed a few days back you didn't like it when a user made a new account, except with the internet these days and how everything is archived for all time, throwaway's are the only option. Building a comment history is extremely dangerous, especially when you might forget what details you may have posted or how meta-data can leak through (such as what subs you post in, any details you posted that could identify you etc).
You can't have it both ways: no to multiple accounts and also no to control over your data. I might have 50 accounts, dislike it? Give me proper control over my comments. (to be honest, it may just be worth making a new account for every comment for maximum privacy, it's extreme, but it's a viable option).
If I want to delete them, that's my choice to freely make. Your thoughts or concerns are not relevant to me, thankfully, the GDPR agrees.
I noticed a few days back you didn't like it when a user made a new account
I think you must have misunderstood whatever the moderation comment was, there's no prohibition on throwaway or multiple accounts. Just against using them to violate the site guidelines which is a different thing.
The BBC, like a lot of news orgs, wants to maximize advertising impressions, and the way to do that is to keep almost all links pointed to itself. A link to the more substantive source is a reader lost.
>The BBC, like a lot of news orgs, wants to maximize advertising impressions
the BBC is to a large degree publicly funded and a public service broadcaster, and advertisements are only shown outside of the United Kingdom. IIRC over 75% of their funding comes from British license fees, most of the rest from licensing their content to third parties outside of the UK under a separate commercial branch.
(BlenderBot original developer here) You can read the paper here, which contains much more technical details of how we built it: https://arxiv.org/abs/2004.13637
Clarification for people who don't get the joke: M-x doctor (doctor.el) is an Emacs mode — basically a builtin chatbot. This is me feeding the human part of the conversation featured in the Facebook blog post into doctor.el.
I'm not the parent but I think the parent literally pasted the lines from the example as-is. As others pointed out, it's difficult to get an instance running on your own.
There's a general "Linux" one and a "SysAdmin" one. But yeah, no specific Arch one.
Also, there's a version of that subreddit using GPT-2, if you're interested, seems to provide way cooler results compared to the one you linked which uses Markov Chains.
https://reddit.com/r/SubSimulatorGPT2
Work lunches, weddings, and so forth are a control for that. My experience is you always know who the vegans are in advance.
Although I have learned from that that you need to hover around the vegan dishes to stop the non-vegans from eating it all before the vegans get to it, or order far more than should be needed.
Good food is good food. Some good food happens to be vegan. It isn't hugely "special" especially these days, when Indian food is reasonably popular; this, incidentally, debunks the notion that vegans all eat weird concoctions of soy meant to resemble meat. I'm sure some do, but a curry which happens to contain no animal products is much more appealing.
They use a fair bit of ghee don't they? I went from memory but Wikipedia seems to agree with that.
"Ghee is widely used in South Indian cuisine for tempering curries, in preparation of rice dishes and sweets. South Indians have a habit of adding ghee to their rice before eating it with pickles and curries. South Indians are one of the biggest consumers of ghee."
While everything tastes better with ghee, you can cook a lot of Indian food using plant-based oils. I use sunflower oil for a lot of dishes, but you can also use vegetable oil, olive oil, or mustard oil.
With a bit of effort, you can eliminate animal products from most vegetarian Indian food. It will taste different, but not too much.
I don't think anyways arguing that oil isn't a replacement for butter. I think we're curious about the expectation. Can we expect most Indian restaurants or caterers to use oils instead of ghee? Do you need to ask them if they use ghee?
Fancy Indian restaurants in the West as well as back here are almost certainly using ghee. Cheaper places might use vanaspati (a ghee substitute made with palm oil) or oil. In general, though, you can never tell for sure unless you're doing the cooking yourself.
I really do agree with you, but butter and dairy are pretty ubiquitously used in American cooking, it's actually really difficult to eat at restaurants that don't use butter on things that would definitely be vegan otherwise.
>when Indian food is reasonably popular
I'm willing to bet that a lot Americans who try to make Indian food at home will use butter instead of oils for their pans. Also, the most popular Indian dishes that don't have meat, namely * Paneer, is cheese.
But yes, if I wanted to avoid animal products, I would definitely try the Indian place. They definitely have at least thought about whether or not animal products have gone into their food, while usually the average American isn't really sure of the ingredients to their foods.
>debunks the notion that vegans all eat weird concoctions of soy meant to resemble meat. I'm sure some do, but a curry which happens to contain no animal products is much more appealing.
Rice and beans, a complete meal nutritionally and easy to make interesting!
But to address the point of GP's post:
You really do need to hover over dietarily restricted food. It's really sad when you order 1 GF dish and everyone wants to try it, leaving the one person with an allergy to drink soda. Someone needs to be there saying: "there's a limited amount of this food for people with dietary restrictions. If you don't have that dietary restriction, please move on. There's plenty of food for you."
Reddit a dumpster fire? Some of Reddits technical communities have a way higher signal than HN, like Reddit’s math or rust community. Reddit math is more curious about math than HN is about tech.
Reddit's best big technical communities are better on this regard too. Their moderation is more transparent and dependent on volunteerism, and somehow even the more serious communities squeeze in more humor.
Also, HN gives you the vibe such that you'd wish to argue about the orthogonality of empathy and signal? As opposed to HN feels like SO?
Do they actually though? If you have empathy for the people in your community, are you not more likely to contribute more meaningfully to that community (I.e. more signal)?
To even have a snowball's chance at success, they would have had to make use of reddit's voting system. Tons of toxicity and disinformation still makes it up into highly upvoted comments, but I'd expect throwing away heavily downvoted comments to exclude a good fraction of the utter crap.
Hi, paper author here. The model was fine-tuned on (non-Reddit) data that was specifically designed to give it positive conversational skills, like empathy, knowledge, and personality (see: https://arxiv.org/abs/2004.13637).
No FB data was used to train these models, which is what allowed us to open source it.
I don't understand that last bit. Can you expand a bit please? Earlier comments in this thread say the copyright on a model is a grey area and can be classified as fair use
To be fair, there is plenty of content on reddit that is far more empathetic and human than that created by many other people who call it a dumpster fire from afar.
It's not just the 'messages' they're using, it's the 'voting' and 'moderation' human inputs as well.
If a comment is deleted by moderators, the absence of that comment influences the outcomes of using the dataset.
Facebook has no such human moderation of all conversations. Neither does Twitter. That's why it didn't turn quite as evil as the Microsoft bot.
But in the end, this all critically depends on human beings making human judgments and having those taken into account when training the bot. The text itself is secondary. If it was just text, Facebook could have trained using their own dataset. This way, they get all the benefits of volunteer moderators (upvotes, downvotes, moderator-deletes all qualify) without having to pay anyone a single penny for their effort.
I'd love to see Facebook train a chatbot using messenger data only from conversations between people and their grandmothers.
Obviously this is creepy because of privacy concerns, but I would love if I could pick the 'personality' of my chatbot, so that it responded to me in language and tone more like a responsible adult and not an average redditor.
Of course you could always run it the opposite way and get a chatbot that responds like a teenage daughter does to her parents. That would be both equally hilarious and painful.
> How can facebook turn a dumpster fire like reddit in a bot that response with more empathy than a human?
Because people who say “Reddit is a dumpster fire” are usually just thinking of r/Politics, RedPill, TheDonald, LateStageCapitalism, basically any remotely political subreddit... when, in reality, there are plenty of subreddits where quality conversation can be had and some where people just share art or animal pictures, and these are pleasant places to kill some time (although there’s almost always per-subreddit groupthink, but it’s not like HN doesn’t also suffer from that in some cases).
Yeah I think a huge advantage Reddit has over other social networks is that it's broken up into myriad communities which are able to develop their own norms and self-govern to some degree. In smaller communities, this totally sidesteps some of the issues you have on a platform like Twitter, where people with very different beliefs and assumptions are essentially having conversations about how much they don't like each-other in the same room.
Reddit has no shortage of problems, but it's the most civilized large online discussion platform by far.
Moderation, partitioning of interests into subreddits, and the existence of downvotes go a long way to reeling in the worst things about online discussions.
> You may talk with our models. The 2.7B can be interacted with on a 16gb P100 GPU or better. The 9.4B parameter model requires at least two 32gb V100 GPUs to interact with.
Hi there. One of the original authors of the BlenderBot paper here. This cocohub website is not what we released. It appears that cocohub has removed our safety layer and provided other modifications to the bot. We released our model open source, along with specific instructions on interacting with it safely: https://parl.ai/projects/recipes/
It's important to note that dialogue safety is a very important and nuanced topic, and we did our best to release a safety system attached to the model. Our system is not perfect though, and that is why BlenderBot was released as a research project for furthering the state of Artificial Intelligence, and is not meant for production purposes.
I would also mention that the blender small model significantly underperforms compared to the larger models released with the paper, and encourage everyone to try our best models, not our small one.
Hi stephenroller,
I posted it as is (without the bad words filter).
I also think it's very clear the model reflects the training data.
And also kudos and thanks for releasing the code along with the paper.
I'm obviously not claiming this a bot by facebook just research made accessible. let me know if you'd like to amend any of the texts next to it to make it any clearer.
(just tried to properly credit the creators)
Hi there, thanks for asking! It's very cool to make research more accessible. Would you be able to rename your experience to "Modified Small BlenderBot without Safety Layer"? And I think it’d be helpful to call out what changes you made when you say it’s adapted from our research.
But to be honest Blender is kinda underwhelming. I had better conversations with alice derivates. Blender feels bland, like a dozen different responses with only some words from my text inserted.
Sure!
It is the 90M params models and they trained models up to almost 10B params so I guess it gets better with the size (Didn't try way too expensive).
And I agree about the alice derivates mitzuku is nice without doing anything fancy.
I would imagine Reddit, being a forum of threaded posts, has far, far, far more conversational interactions than Facebook where everything is basically one-shot, no threading. You want to train a convo bot on conversations.
I’ve actually always wanted to do this; Categorise FB comments based on the original post, then build some sort of Q&A platform where the answers are just those FB comments. (Of course with randomised names and profile pictures)
Facebook is not a university, so there's no technical need for an IRB but I'd be curious what, if any, review process was done to approve this research?
I understand that posts I made are in public, but I feel uneasy about a for profit company I am not a user of scanning, archiving, and using posts I made in public to aid their business, especially if they have a huge corpus of data from people who opted into the product.
(Also I am using a throwaway for privacy, but I will proactively note I do not have any stock in, nor am I an employeee of, any Facebook competitors. But I fully admit I deleted my Facebook, and I did so because I did not feel like my data was being treated respectfully by the company.)
Thank you for taking the time to post this, and sorry to put you on the spot in a public forum - if this was Twitter I'd have DMed :)
>Which has never been created. How about start with that.
Exactly. This is basically the digital equivalent of a parrot. I'll start to get excited if any of these bots can produce a response to the question "what did we talk about 10 minutes ago?"
I think it's probably because Facebook comments are garbage. Just people tagging eachother and saying lol. Reddit's are hardly high quality but at least they are roughly sentences most of the time.
I think that one day, all of github will be slurped into a bot that will take over most engineer tasks, and while it will not replace engineers, it will make at least 60% of engineers redundant.
Are you an engineer? How is your time divided between building (i.e. coding) and deciding what to build (meetings, gathering requirements, understanding the problem, etc.)? First we will automate sooner or later. The latter will be rather more difficult.
Imagine you had 1 group of engineers tasked with 40% meetings, 60% programming tasks.
Now, you split that group 40/60 into 2 groups so that one group is allocated 100% to meetings, and the other group 100% to programming. Now replace the latter group with bots.
I'm having nightmares about a dystopian future in which I scream into the void of a Facebook chat bot and hear a response from the voice of a billion anonymous redditors.
> I was hoping this would be about how bears are evil and we should all get rid of them! I am very disappointed!
Also, a conversation:
> I love the list. I feel like I should read more.
And reply:
> The list is a bit long, but the bear is one of my favorite fictional creatures. A bear of pure intelligence; an evil bear! A bear of pure desire to conquer!
Now, a GPT2 bot trained on the heavily-moderated /r/AskHistorians subreddit:
> How did European and Asian cultures come to know about the moon during the Middle Ages?
A quote:
> I don't know enough to really comment on this subject, but I would suggest looking up the History Channel series "Ancient Aliens" which covered the discovery of the moon.
A longer quote, with some interesting fake facts:
> I don't have a source, but they did not observe the moon for 300 years. It was first observed in 564 BCE by the Chinese Emperor Diocletian. The idea of space travel was not the same as that of modern science, and the Chinese weren't trying to be overly scientific (they were doing this during a time when China was ruled by the Han Dynasty and didn't have to worry about scientific advancement) so they did not have a good understanding of the universe when it was first discovered. The Chinese did not invent astronomy until the Song Dynasty, and did not have any sort of understanding of the solar system before that. There was a theory in China about the existence of other bodies in the solar system, but it was never really explored and had no evidence to back it up (because most people did not believe in the existence of other celestial bodies, even though there were many theories about the existence of many different celestial bodies). The Chinese did not have the technology to actually observe the moon. They were not able to observe it with telescopes, and so they only knew about the moon. The Chinese did not have an understanding of the solar system before that, and did not have any understanding of the moon, so they did not know what it was. They were not even aware of the existence of other celestial bodies at that time, so they didn't know that there was one.
The "Chinese Emperor Diocletian" is hilariously wrong, but it flows right and sounds reasonable in context. Similarly the phrase "they were doing this during a time when China was ruled by the Han Dynasty and didn't have to worry about scientific advancement"; it sounds like something an educated person would write about history, even though it's almost certainly entirely wrong.
> Man Gets Sentenced To A 1-Year In Prison After Trying To Kill A Pork Custodian By Shooting Him In The Face
"Pork Custodian" is the only thing which doesn't work there.
Now, the fake news, formatting in the original:
> A little background on the situation. It appears that on the evening of 9/2/15, the police were called to a local residence after a man tried to shoot his neighbor, shooting him in the face. From the article:
>> The incident occurred when a man, who has not been named, went on a violent rampage.
>> The man, a resident of the residence, was reportedly upset about the way his neighbor's dog was barking. In the ensuing confrontation, the suspect shot his neighbor in the face.
>> The victim, an elderly man, was shot in the right temple and was transported to a local hospital.
>> The man, who has not been identified by authorities, was apparently intoxicated and apparently wanted to kill his neighbor. The man shot the man's neighbor in the face with a .38 caliber handgun.
>> The victim was taken to a local hospital. He is in stable condition.
>> The man is being held in the Polk County Jail and will be arraigned on 11/7/15 in front of a judge.
Anyway, I'm not sure what Facebook was expecting. Bots can imitate human text reasonably well sometimes, but they don't understand context or the concept of facts or reality yet.
Most of those comments are so human/Redditor that I'm now convinced at least 25% of Reddit comments come from bots. Plenty of these stupid interactions are at least as lucid/lame as standard Redditor banter/arguing.
I’ve been developing some chat bots recently professionally and, honestly, I don’t get the point of these kinds of projects, especially done by companies like Facebook - sure you could create some text generator based on tokens, but it serves no purpose.
Give me a trained bot that can extract specific things in various different ways users express them (without me creating dumb questionnaires), match across thousands of domain specific technical variations of terms, understand voice as well as text... until then it’s all stupid tricks that just show Facebook has too much money to waste.
Paper: https://arxiv.org/pdf/2004.13637.pdf
Open Source: https://parl.ai/projects/recipes/
Ask us anything, the Facebook team behind it is happy to answer questions here.