Introduction
Sarcasm detection has been a difficult problem in traditional
Natural Language Processing. The difficulty in recognition of sarcasm causes misunderstanding
in everyday communication and poses problems to many NLP systems. There were
many approaches made in solving this problem. These included rule-based AI,
statistical based AI and machine learning based AI. The rule-based AI quite onerous
to program. Also, it shows an inability in understanding the context or meaning
of words.
The Problem Statement
Since sarcasm detection has received considerable attention
in the NLP community in recent years, many computational approaches for sarcasm
detection have been modeled either the utterance in isolation or together with
contextual information such as conversation context, author context, visual
context, or cognitive features. Here, I present a deep neural network-based
sarcasm detection technique on datasets with and without conversational
context. Two kinds of datasets were used:
Twitter conversations and conversation threads from Reddit, and News Headlines.
The goal is to understand the importance context plays in detecting sarcasm.
The Dataset
Dataset with Conversational Context
Twitter and Reddit conversations were taken to create this
dataset. This dataset had 3 columns in total:
1. label: 0
indicating not sarcastic and 1 indicating sarcastic.
2. context: This
was a list of 2 elements. The first element being the first sentence in the
conversation and the second element being the third/final statement in the
conversation. This final statement is the one that is classified as sarcastic
or not.
3. response: Contains
the second statement, the response to the initial comment, in the conversation.
The total number of entries in this data is 9,400. Even though
the dataset is small, we’ll soon see how this yields a better output.
Dataset without Conversational Context
This dataset consists of News Headlines. This dataset had 2
columns:
1. is_sarcastic: 0
indicating not sarcastic, 1 indicating sarcastic.
2. headline: The
headline that needs to be classified as sarcastic or non- sarcastic.
There were a total of 55.3k entries in this data.
Proposed Solution
There were two models made, one for each dataset, but the
basic architecture more or less remained the same.
The Architecture
1. Embedding:
Accepts comments and encodes them into a vector of size e, outputs matrix of size
100x32.
2. Convolution:
The data now undergoes 1-dimensional convolution. This layer establishes 14-word combinations.
3. Max
Pooling: This layer is used to reduce overfitting and add additional layers to
the network.
4. Convolution:
This layer groups those 14-word combinations into groups of 7. Basically, this
creates a group of phrases, each 14 word long.
5. Bidirectional
LSTM: Used to train the model in chronological and reverse order.
6. Output,
Loss Functions and Hyperparameters: The output layer consists of a single
sigmoid neuron trained with the loss function binary_crossentropy. For the model
trained on conversational context (Model1), the activation function used was
LeakyReLU to avoid vanishing gradient, while the other model (Model2) used
ReLU. Both these models were trained using the Adam Optimizer, with a learning
rate of 0.0005 applied on Model1.
Result
Since the main goal of this project
was to understand the importance of context, accuracies weren’t compared between
these models. A couple of custom input were entered to check the output. It was
found that Model1 could easily understand the difference between a sarcastic
and a non-sarcastic comment, while Model2 couldn’t. Also, Model2 could only
understand a sarcastic comment if it was in the form of a news headline. Any
other normal sarcastic statements went totally unidentified.
Conclusion
It can be concluded, that even though
the non-conversational dataset was larger than the conversational, the latter
could provide a better output only due to the presence of information regarding
the scenario the sarcastic statement was used in.
No comments:
Post a Comment