All the pre-processing is taken care by the summarize() itself.
Title is currently not used in generating summary.
Matrix containing how many times each word repeated in a sentence.And the values are normalized by dividing with length of unique words in each sentence.
Single row matrix containing idf scores.
Matrix obtained by multiplying tf into idf matrices.
Scores of each sentence in the given data.
The summary generated using TF-IDF scores.
Term_frequency = (Number of times a term 't' occurs in a sentence) / (Total number of unique words in that sentence)
Inverse_document_frequency = log((Total number of documents)+1 / (Number of documents in which term 't' appears)+1)
Here is a class diagram, containing all the methods within the saramsh package.
Below is an workflow of how the module generates summary using the given data.View
Below is simple starter code :
>>> from saramsh_package.saramsh import Saramsh
>>> sm = Saramsh(data , title)
Divya Dutta: ‘I once lost a role because I was told I am fair’
She wants to do your role.Earlier, we used to say chhota role nahi karenge.I never said that, but I have seen girls say it.Now they say chhota hai, koi baat nahi, impactful hai na?Divya herself had to colour her skin for Delhi 6.They were looking for a village woman who is darker.They told me you suit the role completely.I asked then why am I not doing it?They replied you are too fair for the role.But I am an actor.I was darkened for Delhi 6.