Make Amazing Wordcloud in Python

Make Amazing Wordcloud in Python

When working with data, representing it in a intuitive and consumable form is a crucial task.

A visual representation of data is always helpful to get a sense of trend and outliers which may be difficult or time taking to find when analyzing the data numerically.

When the data type is numeric one can make different type of charts. When the data type is alphabetic or “string” word clouds” are handy to visualize non-numeric data.

Lets Dive into how can we make a “Wordcloud” using python :

To make WordCloud in python we will need the “wordcloud” module.

To install the ‘wordcloud‘ module run the following command in cmd:

pip install wordcloud

Wordcloud depends on “Numpy” module for efficient array calculation and “Pillow” for image processing tasks.

Additionally we can save and show the generated wordclouds using ‘matplotlib’ module

Install the below additional modules before jumping to code:

Additional Modules required (Pre-requisite setup)

pip install numpy
pip install matplotlib
pip install pillow

Data for wordcloud

Any Text file can be used as ‘data’ for making wordcloud.

For this tutorial we will use “Romeo & Juliet” Novel in text format.

Download Romeo & Juliet .txt file. Romeo and Juliet.txt

Now, we are ready to dive into the code and generate some “WooooordCloud”.

Simple Rectangular Wordcloud

First Rule of python “Import the necessary modules”

import numpy as np 
 #neccessary for wordcloud
from PIL import Image,ImageOps
 #pillow module neccessary for wordcloud
import matplotlib.pyplot as plt
 #to show and save image
from wordcloud import WordCloud,STOPWORDS,ImageColorGenerator #the real wordcloud module
from scipy.ndimage import gaussian_gradient_magnitude #edge detection

Now, we will read the text from the downloaded text file.


Generate WordCloud save and display the output

  #width of the output image
canvas_height=1080  #height of the output image
wordcloud = WordCloud(width=canvas_width,height=canvas_height).generate(text)
 #generate wordcloud
wordcloud.to_file("simple_wordcloud.png") #save the output wordcloud in png format
plt.imshow(wordcloud, interpolation='bilinear')
 #show the image output 
output of simple wordcloud in python

After running the above steps you will be able to generate a simple wordcloud with default parameters.

But wordcloud is more creative than just making a rectangle with words in it. Lets explore what else we can do with Wordclouds.

Before moving forward, by default wordcloud generates random patterns on every run, this randomness can be fixed by setting the seed in ‘random_state’ parameter in the wordcloud function.

wordcloud = WordCloud(random_state=1).generate(text) # replace 1 with any number to get different result

List of wordcloud parameter.

  • width – to set canvas width
  • height – to set canvas height
  • max_font_size– set the maximum size of word
  • min_font_size – set minimum size of word
  • background_color- set background color of the canavas
  • mask – to make wordcloud in a specific region
  • random_state – to set seed (stops generating random output)
  • realtive_scaling– scale word size based on there frequency
  • colormap- add a color scheme to the words
  • contour_width– set width of the contour
  • contour_color – set color of contour
  • stopword– to exclude a word from the wordcloud

Basic customization in wordcloud

Let’s try to change:

  • background_color to white
  • max_font_size to 40
  • min_font_size to 10
  • color_map to ‘hot’
  • random_state to 1
  • add stopword ‘thy’

All the above customizations can be done by passing value to the parameter.

stopwords = set(STOPWORDS)
wordcloud = WordCloud(stopwords=stopwords,background_color='white',random_state=1,colormap='hot',max_font_size=800,min_font_size=20,width=canvas_width,height=canvas_height).generate(text)
Customized Wordcloud

Simple Masked WordCloud

In this segment we will generate wordcloud which is confined in a boundary.

A mask image can be a binary black and white image, the wordcloud will confine in the black area.

For this tutorial we will use the following image.->

romeo_mask= np.array("sample_data/romeo_mask.jpg"))
 #read image
wc = WordCloud(mask=romeo_mask,colormap='inferno',random_state=5,max_font_size=50,min_font_size=0)
 configure wordcloud
 #generate wordcloud with text data
 #save image
plt.imshow(wc, interpolation='bilinear') #show image
 #off axis on image #show image

Output of masked wordcloud

Masked Word Cloud output

Masked with Image Color retained Wordcloud

The above example use black and white image for masking, for this section we will use a color image to generate a masked wordcloud. This wordcloud will also sample the color of the image.

William Shakespeare author of Romeo and Juliet

Image used to mask ->

image = np.array("sample_data/romeo_color.jpg"))

image_mask = image.copy()
image[image_mask.sum(axis=2) == 0] = 255

edges = np.mean([gaussian_gradient_magnitude(image[:, :, i] / 255., 2) for i in range(3)], axis=0)
image_mask[edges > .1] = 255
wc = WordCloud(background_color='black', mask=image_mask, mode='RGBA')
image_colors = ImageColorGenerator(image)
plt.figure(figsize=(10, 10))
plt.imshow(wc, interpolation="bilinear")

Output of color masked Wordcloud

Image color masked word cloud

Try playing with different parameters, make some awesome wordclouds and share it in the comments.

Working code of make wordcloud in python. Click Here

Generate WordCloud in online python compile code open to . Fork it, Edit it, Recreate it. Click Here

Wordcloud Module : Documentation

Thank you for reading, Happy Learning, drop your suggestion in the comments.

Feel free to follow us on Youtube, Linked In , Instagram

Loading comments...