Word Cloud
Introduction
A word cloud (or tag cloud) is a visual representation of textual data based on a weighted metric. Tagg Johnson is a man obsessed with counting words that appear in online documents. On his computer, he keeps a spreadsheet of all the sites he visits, along with a list of words that appear on each site and the number of times such word appears. Tagg would like to generate word clouds based on the data he has collected. Before describing the algorithm Tagg uses for generating clouds, we digress for a quick lesson in typography. The basic unit of measure is known as a point (typically abbreviated as pt). A font's size is described based on the vertical number of points from one line to the next, including any interline spacing. For example, with a 12pt font, the vertical space from the top of one character to the top of a character below it is 12 points. We assume that a character's height is precisely equal to the font's point size (regardless of whether the character is upper or lower case).
For this problem, we focus on a fixed-width font, such as Courier, in which each character of the alphabet is also given the same amount of width. The character width for such a font depends on the font size and the aspect ratio. For Courier, a word with t characters rendered in a font of size P has a total width of ⌈916⋅t⋅P⌉ when measured in points. Note well the use of the ceiling operator, which converts any noninteger to the next highest integer. For example, a 5-letter word in a 20pt font would be rendered with a height of 20 points and a width equal to ⌈90016⌉=⌈56.25⌉=57 points.
Tagg pre-sorts his word list into alphabetical order and removes words that do not occur at least five times. For each word w, he computes a point size based on the formula P=8+⌈40(cw−4)(cmax−4)⌉, where cw is the number of occurrences of the word, and cmax is the number of occurrences of the most frequent word in the data set. Note that by this formula, every word will be rendered with anywhere from a 9pt font to a 48pt font. He then places the words in rows, with a 10pt horizontal space between adjacent words, placing as many words as fit in the row, subject to a maximum width W for his entire cloud. The height of a given row is equal to the maximum font size of any word rendered in that row.
Solution
For all words in Tagg's list, find the font size and height of the word. Then, using the given width of the block and 10 pt space between each of the words, find out which words (starting from the beginning of Tagg's list) will fit within the width and the "tallest" word. Continue until all words are used and add the max heights in each of the rows.
Code
Solution - Java
import java.util.*;
/* author: Christine Zhou
* Problem V: Word Cloud
*/
public class WordCloud {
public static void main(String args[]) {
WordCloud w = new WordCloud();
w.cloud();
}
public static void cloud() {
int n = 0;
Scanner in = new Scanner(System.in);
int width = in.nextInt();
int words = in.nextInt();
while (width!= 0) {
n++;
ArrayList<Integer> wordlength = new ArrayList<Integer>();
ArrayList<Integer> occ = new ArrayList<Integer>();
for (int k = 0; k < words; k++) {
wordlength.add((in.next()).length());
occ.add(in.nextInt());
}
Integer max = Collections.max(occ);
double cmax = 0 + max;
int[] ptsize = new int[words];
double[] wordwidth = new double[words];
for (int k = 0; k < words; k ++) {
ptsize[k] = (int) (8 + Math.ceil(40*(occ.get(k)-4)/(cmax-4)));
double j = (double) wordlength.get(k);
double l = (double) ptsize[k];
double m = (double) (j*l*9/16);
wordwidth[k] = Math.ceil(m);
}
int wordheight = 0;
int mywidth = 0;
int hspace = 10;
ArrayList<Integer> Heights = new ArrayList<Integer>();
for (int k = 0; k < words; k++) {
mywidth += (int) wordwidth[k];
if (mywidth > width) {
mywidth = (int) wordwidth[k];
wordheight += Collections.max(Heights);
Heights.clear();
}
mywidth+=hspace;
Heights.add(ptsize[k]);
}
if (Heights.size()!= 0) wordheight+= Collections.max(Heights);
System.out.println("CLOUD " + n + ": " + wordheight);
Heights.clear();
width = in.nextInt();
words = in.nextInt();
}
in.close();
}
}