Text Frequency Using R


Export Sentiment Value, Text Frequency, and Positive Negative and Neutral words

Inside the text, there is a lot of data that we can’t see. More data analyst are used scripts and tools for the text analysis. But here we can start from some basic text analysis like what is the polarity of the text which word is more frequent in the text so we can easily understand the what people talk about us on social channel or other platforms which is more important to find this data. With this data, we can get some insight like users is positive towards our brand or negative, which term they used more in the review like good, bad or about quality something like that. For this text analysis, we used the below R script select your CSV file and set the path where you want to save the output and run this code.

# Load
 # Read the text file from internet
 text <- readLines(“data.csv”)
 # Load the data as a corpus
 docs <- Corpus(VectorSource(text))
 toSpace <- content_transformer(function (x , pattern ) gsub(pattern, ” “, x))
 docs <- tm_map(docs, toSpace, “/”)
 docs <- tm_map(docs, toSpace, “@”)
 docs <- tm_map(docs, toSpace, “\\|”)
 # Convert the text to lower case
 docs <- tm_map(docs, content_transformer(tolower))
 # Remove numbers
 docs <- tm_map(docs, removeNumbers)
 # Remove english common stopwords
 docs <- tm_map(docs, removeWords, stopwords(“english”))
 # Remove your own stop word
 # specify your stopwords as a character vector
 docs <- tm_map(docs, removeWords, c(“blabla1”, “blabla2”))
 # Remove punctuations
 docs <- tm_map(docs, removePunctuation)
 # Eliminate extra white spaces
 docs <- tm_map(docs, stripWhitespace)
 # Text stemming
 # docs <- tm_map(docs, stemDocument)
 dtm <- TermDocumentMatrix(docs)
 m <- as.matrix(dtm)
 v <- sort(rowSums(m),decreasing=TRUE)
 d <- data.frame(word = names(v),freq=v)
 head(d, 10)

 wordcloud(words = d$word, freq = d$freq, min.freq = 1,
           max.words=200, random.order=FALSE, rot.per=0.35,
           colors=brewer.pal(8, “Dark2”))

 barplot(d[1:15,]$freq, las = 2, names.arg = d[1:15,]$word,
         col =”lightblue”, main =”Most frequent words”,
         ylab = “Word frequencies”)


 write.table(xo,file = ‘tibble_matrix.csv’, sep = ‘,’, row.names = FALSE)