Document management is a vital task in any enterprise. In many domains, a massive number of documents written in natural language are available although they are not well configured, and the relations between them are not determined. Organising this amount of information manually is not feasible in many domains. A method that can assist us in extracting the similarities between documents is the first step toward an autonomous framework for managing documents.
In NLP, distributed representation (or word embeddings) for text has been widely studied. In this approach, a vector represents natural language elements, including word, phrase, paragraph, or even whole document. The vector representation captures the semantics of NL elements.
On the other hand, each document has metadata that relates the document to other entities, including people (e.g., authors) and documents (e.g., cited papers). Modelling documents by taking to account both content and context of documents is essential.
In this project, we seek to address the issue of modelling documents based on their content and context.