Algorithm to implement a word cloud like Wordle
Asked Answered
I

13

214

Context

My Questions

  • Is there an algorithm available that does what Wordle does?
  • If no, what are some alternatives that produces similar kinds of output?

Why I'm asking

  • just curious
  • want to learn
Incondensable answered 5/12, 2008 at 1:50 Comment(2)
There's an alternate implementation, based on image processing here. Not very speedy, but very flexible and good for experimentation. (There's a full implementation given in Mathematica.)Effortless
I came up with my own (pretty simple) algorithm and blogged about it. Its written in Python and should be easy to customize. I tried to make it half-way efficient. !enter image description hereMollymollycoddle
A
514

I'm the creator of Wordle. Here's how Wordle actually works:

Count the words, throw away boring words, and sort by the count, descending. Keep the top N words for some N. Assign each word a font size proportional to its count. Generate a Java2D Shape for each word, using the Java2D API.

Each word "wants" to be somewhere, such as "at some random x position in the vertical center". In decreasing order of frequency, do this for each word:

place the word where it wants to be
while it intersects any of the previously placed words
    move it one step along an ever-increasing spiral

That's it. The hard part is in doing the intersection-testing efficiently, for which I use last-hit caching, hierarchical bounding boxes, and a quadtree spatial index (all of which are things you can learn more about with some diligent googling).

Edit: As Reto Aebersold pointed out, there's now a book chapter, freely available, that covers this same territory: Beautiful Visualization, Chapter 3: Wordle

Acidophil answered 25/9, 2009 at 16:33 Comment(6)
More information here: static.mrfeinberg.com/bv_ch03.pdf - Thanks Jonathan.Deirdre
Thanks for the info Jonathan - I'm fascinated by realtively simple algorithms that can create great visualisations like this.Noheminoil
I'm still a bit puzzled on the "wants to be somewhere" part. Is the initial position of words really random?Guilford
Thank you for the explanation. What does "each word wants to be somewhere" mean?Schrecklichkeit
It means that some data structure, which represents the word to placed, starts out with a preferred x and y coordinate.Acidophil
I now realize much of the Wordle IP belongs to IBM (re: the FAQ), but IBM has in the past made a lot of stuff open source, including Eclipse, OpenUP, etc. Wordle is being used a lot in education, which is philosophically an "open" environment. Have you thought of suggesting to IBM that Wordle be added to the list of their important Open Source contributions?Arjun
L
38

Here's a really nice javascript one from Jason Davies that uses d3. You can even use webfonts with it.

Demo: http://www.jasondavies.com/wordcloud/

Github: https://github.com/jasondavies/d3-cloud

Lour answered 8/5, 2012 at 22:14 Comment(3)
It's very easy to just copy the src=".js" files and reupload them for building onto or just using as is. Thanks for sharing and works great!Arabesque
Is there a way to change the color palette to something more aesthetic? I tried modifying the js file from the JSON call from: colourlovers.com/api/palettes/random to colourlovers.com/api/palettes/top as the colourlovers' API recommends but the palette remained the same.Arabesque
Here is a responsive working example based on the demo but with full control on words and color. For a custom color palette please use the commented code, instead. jsbin.com/kiwojayoye/1/edit?html,js,outputSliwa
D
32

I've implemented an algorithm as described by Jonathan Feinberg using python to create a tag cloud. It is far away from the beautiful clouds of wordle.net but it gives you an idea how it could be done.

You can find the project here.

Deirdre answered 3/8, 2010 at 7:57 Comment(2)
Link (labs.atizo.com) is broken again. You really should post a sample image or two so we can see the comparison.Ruthannruthanne
@RetoAebersold is there anyway to integrate this code with flask or django framework?Chunky
A
31

I've created a Silverlight component that uses the algorithm Jonathan suggests here. The source code and example projects are all available on my blog:

http://whydoidoit.com

Color word cloud

My cloud lets you color and size words based on different weightings and it supports word selection (from a coordinate) and selected word highlighting. The source is yours to use as you see fit.

Example Word Cloud

Ailin answered 29/7, 2011 at 17:26 Comment(2)
Your blog seems to be empty. Has the link died?Inexpedient
Here is the archive.org snapshot, web.archive.org/web/20110820202717/http://whydoidoit.com/… I also found the project on github github.com/whydoidoit/WordCloudEuphony
B
14

I'm working on WordCram, a Processing library for making word clouds. It's pretty heavily influenced by Wordle, and is informed by the same PDF aeby linked to above. It handles the collision detection for you, and lets you focus on how you want your words laid out, colored, rotated, etc.

Brahmin answered 2/4, 2011 at 21:21 Comment(2)
Does your service offer an API?Jettiejettison
Sorry, WordCram doesn't have an API. It's a library, not a service.Brahmin
P
10

http://code.google.com/apis/visualization/documentation/gallery.html

Check out the word cloud visualization. Not as fancy as wordle.net but real easy to add to your site.

Phyla answered 23/12, 2008 at 23:33 Comment(0)
C
8

I was looking for a wordle-like visualization which would allow to assign color, initial position and size of a String related to other data, such as the relevance within a text - didn't find anything, but thanks to the information I found here (Especially Jonathan's explanation and aeby's link), I could finally implement 'Cloudio', which comes relatively close to wordle (at least I think so...) and offers the features I was looking for.

It is implemented with SWT and JFace, and I tried to integrate it into the MVC-model of JFace, such that you can set content- and label-providers to modify the layout of a cloud and add it to other Eclipse-plugins or RCP apps. You can also modify the way the initial position of a string is calculated, such that is not difficult to use it for cluster visualization or else. It is still poorly documented and limited in some ways (and I did the initial upload a few hours ago, so it might still be a bit buggy), but if you're interested, here's the link:

And here's a link to some created clouds, in case you want a quick impression: https://github.com/sschwieb/Cloudio/wiki/Example-Clouds

Cheers, Stephan

Conlin answered 15/6, 2011 at 19:30 Comment(0)
N
8

Here see my implementation of Wordle like cloud. It uses the same spiral algorithm and the QuadTree data structure.

http://sourcecodecloud.codeplex.com

or

http://www.codeproject.com/Articles/224231/Word-Cloud-Tag-Cloud-Generator-Control-for-NET-Win

Neo answered 19/7, 2011 at 10:44 Comment(2)
sourcecodecloud is not downloadable,also second link is not workingCogan
I verified first link's Source Code / Download. It worked. Second link was moved. Now fixed.Neo
A
7

Lion and Lamb is an open-source iOS app that creates word clouds using the most frequent words from a chosen book of the Bible.

It's based on the algorithm as described by Jonathan Feinberg. Hit testing does utilize a quad tree, but the bounding boxes are based on the glyph's bounding rectangle. I want to break the glyph down into many smaller bounding rects to enable word placement within a glyph's bounding box.

GitHub: https://github.com/PetahChristian/LionAndLamb

A word cloud of the Bible book of Revelation

Araiza answered 4/8, 2015 at 20:42 Comment(0)
R
4

I have a Tag Cloud generator here, which I call Disorganizer :)

Sources TagCloudService and the razor markup control and a WinForm for testing purposes that you can put in your blog, profile etc, with a little wrapper around it. It uses C# 4.0 & System.Drawing namespace heavily.

I created it because with the other cloud generators you cannot click on tags to navigate and cannot create hover animations, to show that they are clickable. Since showing hover animation in HTML is necessary for me (I'm doing this with overlay-ed, absolutely-positioned <a> tags) I haven't developed any-angle word display - they are either vertical or horizontal.

Warning :The above links may go invalid in a few months, I plan to slowly untie it from the surrounding project into a separate project.

You can see a working demo on this sample blog post, but it is incomplete, and in an incomplete site. Contact me if anyone wants to contribute, I will get on with separating it out asap.

Rift answered 10/6, 2011 at 17:33 Comment(2)
Links have gone invalid. I like the UI on your blog.Crookes
Thanks, just fixed themRift
C
2

Here is yet another end-to-end implementation of wordle in Python 3 largely based on the initial outline by Jonathan Feinberg (QuadTrees, spirals, etc.).

The code (commented, with detailed ReadMe file) is freely available at this Github repository and this is a sample wordle created with the code.

Macbeth

Costrel answered 27/4, 2019 at 16:49 Comment(0)
F
1

I've implemented a word cloud generator called WordCloud.jl in Julia language. A brief description about its algorithm can be found here.
Unlike most other implementations, I designed it based on gradient optimization. It’s a non-greedy algorithm in which words can be further moved after they are positioned. Thus the size of the words and the shape and size of the background mask can be kept unchanged in the generation process. This makes the outputs more accurate and easy to customize. Furthermore, we can also generate some fancy outputs like these:
Comparison of Obama's and Trump's inaugural address and Wikipedia: Julia
comparison wordcloud

julia wordcloud

Ferromagnetism answered 25/4, 2021 at 9:12 Comment(2)
While this link may answer the question, it is better to include the essential parts of the algorithm description here and provide the link for reference.Flutter
Thank you @nik7. I've added some algorithm description as you suggested.Ferromagnetism
B
0

There is a pretty nice little JavaScript library made by Tim Dream:

https://github.com/timdream/wordcloud2.js/blob/gh-pages/API.md

It can create a word cloud on a canvas or with HTML tags with a lot of options to modify the result. It comes really close to wordle's output.

Breechcloth answered 17/6, 2020 at 20:3 Comment(1)
Can't you just do something on your own ? Where is the curiosity? Dead alreadyEsemplastic

© 2022 - 2024 — McMap. All rights reserved.