It's a little tricky getting this to work because you need two separate models w...

It's a little tricky getting this to work because you need two separate models working together, but I tried it out. Here's some of the samples I generated:

https://imgur.com/Uwp1wfu

https://imgur.com/yuW9Yre

https://imgur.com/oZ4wzdC some definite weaknesses in the natural language embedding

https://imgur.com/MAupphr roses in general don't seem to work well. must not have been many in the dataset

You can see that it works better than one would expect, but there are definitely limits to the understanding. The flower and COCO datasets are, ultimately, not that big. What would be exciting is if you could train it on some extremely large and well-annotated dataset like Danbooru.