The Quick War? The perception of WWI in Germany local daily newspapers of the time

As the expectation of a flash war quickly faded, have the spirits stayed strong, or has the strength faded as well?

See the Github repo for this project.
This project won second place at the ACDH virtual Open Data hackathon series 2019: International Open Data Day hack.

Introduction

For this project, we were interested in seeing how German newspapers published during the First World War would be depicting what was going on in the trenches and in world politics. We used a variety of publications from Hamburg provided by the Europeana Newspapers Project, specifically selecting the daily newspapers from 1913 to 1919, in order to cover wartime as well as pre- and post-wartime. On this material, we applied dynamic topic modeling and sentiment analysis. The whole project was written in Python.

We delved into the project with a clear hypothesis in mind: as the importance of WWI would be exponentially increasing by gaining space in the papers, the sentiments towards its development would follow the opposite path. We had a vision of high spirits that would progressively plummet; we were wrong.

Step 1: Spring Cleaning

We began by cleaning the dataset which presented a considerable amount of gibberish and words lost in digitalization. It was even too much for the spell-checker. We finally removed stop-words, rather infrequent words, and, of course, the gibberish.

Step 2: Topic Modeling

For the creation of the topic model, we used Gensim. We wanted to extract one topic that reflected the war and see how the topic’s relevant words would variate overtime. The result of the regular topic model (LDA) shows the different types or articles, one of which pertains to description of war-specific articles.

topic #3 (0.010): 0.003*”Regierung” + 0.003*”Mark” + 0.002*”Arbeiter” + 0.002*”Aktien” + 0.002*”Arbeit” + 0.002*”Antrag” + 0.002*”Reich” + 0.002*”Volk” + 0.002*”Frieden” + 0.001*”Berliner”

topic #5 (0.010): 0.003*”Krieg” + 0.003*”England” + 0.003*”Regierung” + 0.002*”Front” + 0.002*”Kriege” + 0.002*”Truppen” + 0.002*”englischen” + 0.002*”Preise” + 0.002*”Lage” + 0.002*”Krieges”

topic #2 (0.010): 0.003*”Mark” + 0.002*”Altona” + 0.002*”Regierung” + 0.002*”Stadt” + 0.002*”Kaiser” + 0.001*”Verein” + 0.001*”Antrag” + 0.001*”Millionen” + 0.001*”Meter” + 0.001*”Sonnabend”

topic #0 (0.010): 0.003*”Truppen” + 0.003*”Krieg” + 0.002*”Pastor” + 0.002*”England” + 0.002*”Regiment” + 0.002*”englischen” + 0.002*”Stadt” + 0.002*”Soldaten” + 0.002*”Reserve” + 0.002*”Frankreich”

#4 (0.010): 0.014*”gesucht” + 0.012*”Altona” + 0.006*”Gesucht” + 0.006*”billig” + 0.005*”Mädchen” + 0.004*”verkaufen” + 0.003*”Wohn” + 0.003*”Part” + 0.003*”Küche” + 0.003*”Zimmer”

Due to the size of the dataset, we could not finish the training of the dynamic topic model in time; however, the code provided will allow its completion (training to be completed in the nearby future).

Step 3: Sentiment Analysis

Last but not least, we wanted to know how the newspaper writers would feel about the development of what was supposed to be a quick war but ultimately really wasn’t. With the help of germanlex, a word sentiment dictionary, we determined the polarity of words found in the newspapers. What we expected to be a general increase of negativity turned out to be a quite the opposite. As it is visible in the graph below, the spirits started high and only got better, if they changed at all.

https://plot.ly/~catb0y/2/

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s