Entity tagging one of the first areas within the lab
Entity tagging or Name Entity Recognition (NER) is one of the first areas that the Swedish Language Data Lab (Svenska Språkdatalabbet) has chosen to focus on.
The project team and the companies that comprise its reference group took part in presentations by Språkbanken (“the Swedish Language Bank”), RISE , Recorded Future and Talkamatic about the work that has been done to collect and assess words, concepts and meanings with a view to using the models to be able to analyse the significance of words within their contextual position.
“We have chosen to start focusing on data collection and Name Entity Tagging, as this base model can be used for many purposes for the users of the data lab. The models will be available to use directly or adjust for specific needs in organizations", says Vanja Carlén, project manager at AI Innovation of Sweden.
"The data comes from a large number of sources: social media, Internet forums and reviews, as well as a large amount of data from news reports. The model is thus trained on words and sentences that are used in everyday language.”
Valuable input from the reference group
The on-site reference group told us more about how they are currently working with NLP (Natural Language Processing), about their challenges and how their activities could contribute a fully developed Swedish language library.
"There is a great deal of interest in this project. It has been very interesting to hear how the reference group is currently applying NLP and to see how they are able to apply the models and datasets that have been developed as part of the project.”, Vanja continues.
“If we can use AI to listen to meetings and compile meeting minutes, we could save over 10,000 hours of work per year.”
There are some obviously application areas for NLP, such as language based service as chat bots, customer services etc, but there are far more areas where this have a great potential;
“If we can use AI to listen to meetings and compile meeting minutes, we could save over 10,000 hours of work per year.” told Maria Hedwall from Astra Zeneca
"As well as the time-saving factor we can also remove the factor of human bias, which may result in notes taken on the basis of an individual’s priorities"
Next up, sentiment analysis
The Swedish Language Data Lab will now move on to its next work package and the development of a sentiment analysis model, which attempts to determine if a text is positive or negative and the general tone of a text and the model will be possible to use for different types of text categorization and feedback analysis.
Vanja Carlén summarised the results so far;
“It was great to show the model that has been developed. The model shows good results and we already see potential implementation areas and interest from the reference team to use the model."
"The next step is now to develop the model further and understand the dialogue perspective to broaden the possible usage of the model. I am looking forward to take part of the new insights and the sentiment analysis model during next reference team workshop in spring 2020."