Hi Ted,
Thanks for going through the article & I am glad that it raised such interesting ideas in your mind & even mine.
Regarding embedding words in their original context, I did not try it. From prior experience, I knew that BERT models were capable of generating similar embeddings for similar words even without context since they are pre-trained. So without thinking too much I directly used this to extract synonyms but that does lead to a problem where even for completely opposite words, it prints out similar embeddings.
This is probably because of the original pre-training that was done given contextual data where those antonyms were used in similar contexts & hence their embeddings also became similar even though the words were opposite.
This is definitely interesting & I will surely take a look at embedding the words based on the original context and yes I will need to think of some way to extract individual word vectors from the complete document or sentence vector it will give me.
The article will be here whenever I have something to share on this!
Thanks again for this awesome comment, this is one of those moments where a comment is as helpful as the article itself.