First off, the link to my tool to score the relevance of documents/data: https://relevance-filter.digitaldrivenworld.com/
As a non-technical researcher, I had experienced some major difficulties when analysing social media data. Most of my research has been examining a particular social issue and most of the scraped data were irrelevant to my research questions. For example, I worked for Social@risk to investigate Vietnamese textile workers’ grievances and collective actions by analysing a large corpus of posts and comments on Facebook. The most time-consuming task was probably finding the relevant data because we had to scrape all the posts of many Facebook groups and pages, and most of the posts and comments were irrelevant to our research problems (too many ads). I had the same difficulties when analysing Singaporean citizens’ sentiment and opinions about the country’s Smart Nation initiative. We scraped millions of posts and comments from the politicians’ and organizations’ Facebook pages, and had to filter out everything that’s irrelevant to the Smart Nation initiative. It was always like looking for a needle in a haystack, and we were always afraid that we would accidentally remove important data when filtering out the irrelevant.
That’s why I developed an application and proposed a method to score textual data’s relevance. The application is not perfect yet, but I tried it out with some of my data corpora and got relatively positive results. At least it saves time compared to searching for keywords in excel. But I can guarantee that it is more useful than that. I attached a document to explain my approach, the rationale behind it, and what researchers should take into account when using the tool. I hope I will find time to modify and improve the tool with my professor’s suggestions.
I’m very grateful for my professor, Dr. Bernhard Rieder, who gave tutorials on Python (and much beyond that) and practical advice for my very first programming project. I’m not a fast learner in programming but his patience and dedication helped me to go through. My fellow classmates in the tutorials also gave lots of suggestions for my project. And last but not least, I wouldn’t have completed this project without my partner’s help. You see, I just designed the tool and coded the back-end. He helped me build the database, set up server, develop the front-end, and fix my bugs :). I just can’t be grateful enough for all the support I had.
I hope the tool will be useful to someone. And let me know if you have any suggestions/ difficulties; it is highly appreciated.