I’ve been using R a lot more at work lately, so I have decided to switch languages from VBA to R for my attempts at Project Euler problems. As of today I’ve solved 25 problems, 8 in the last day. I’ve found that R is much more powerful than VBA, especially with respect to handling vectors and arrays via indexing.
Here is the problem as stated:
Using names.txt, (right click and ‘Save Link/Target As…’), a 46K text file containing over five-thousand first names, begin by sorting it into alphabetical order. Then working out the alphabetical value for each name, multiply this value by its alphabetical position in the list to obtain a name score.
For example, when the list is sorted into alphabetical order, COLIN, which is worth 3 + 15 + 12 + 9 + 14 = 53, is the 938th name in the list. So, COLIN would obtain a score of 938 53 = 49714.
What is the total of all the name scores in the file?
The problem asks you to download a file containing a very long list of names, sort the names, and then assign each of the names a score based on their character composition and rank within the list. You are then asked to take the sum of all the scores.
Solution 1
My first solution consists of 15 lines. First, I imported the text file via read.csv() and assigned the sorted values to a vector called names.sorted. I then ran a loop iterating over each of the names, applying the following procedure to each one:
- Split the name into a vector of characters
- Use the built-in dataset LETTERS which is already indexed from 1-26 to assign a numeric score to each letter that appears in the name. The which function is used to match the characters of each name to the index (the value of which is the same as the score) at which it appears in the dataset LETTERS.
- Sum the scores assigned to each letter, and then multiply the sum by the name’s numeric rank in names.sorted. Then append this value to a vector y.
After the loop, the function sum(y) takes the sum of all the values in the vector y, which is the answer to the question.
Here’s the code for the first solution:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
names<-read.csv("names2.csv",stringsAsFactors=FALSE,header=FALSE,na.strings="") names.v<-as.vector(as.matrix(names)) names.sorted <- sort(names.v) y <- 0 z <- 0 for(i in names.sorted){ x <- 0 z <- z+1 for(j in strsplit(i,c())[[1]]){ x <- append(x,which(LETTERS==j)) } y <- append(y,sum(x)*z) } sum(y) |
Solution 2
After solving the problem, I decided to write an alternative solution that would reduce the number of variables declared. I used the function sapply to apply a single function over the entire vector names.score:
1 2 3 4 5 6 |
names<-read.csv("names2.csv",stringsAsFactors=FALSE,header=FALSE,na.strings="") names.score<-c() for(i in sort(as.vector(as.matrix(names)))){ names.score[i] <- sum(sapply(strsplit(i,c())[[1]],function(x) which(LETTERS==x))) } sum(names.score*seq(1:length(names.sorted))) |
This method allowed me to remove one of the loops and to remove the variables names.v, y and z. This reduced the number of lines of code from 15 to 6.
Solution 3
I then found out I could further reduce the solution to just 2 lines of code by using nested sapply() functions over the names variable:
1 2 |
names <-sort(as.vector(as.matrix(read.csv("names2.csv",stringsAsFactors=FALSE,header=FALSE,na.strings="")))) sum(sapply(names,function(i)sum(sapply(strsplit(i,c())[[1]],function(x) which(LETTERS==x))))*seq(1:length(names))) |
Here, I got rid of the names.score variable and only declared a single variable. The nested sapply() functions are used to first iterate over each element of the vector names, and second, to iterate over each character within those elements of the vector. The sum() function is wrapped around the nested sapply() functions which produces the solution by summing the scores of the individual names.
As you can see, R comes with some neat features that great for condensing your code. However, there are some tradeoffs as the first solution is very easy to read, whereas the last solution may be difficult for people to read, especially if they are not familiar with R. Loops are quite easy to spot in most widely used languages, so someone who knows C++ but not R should be able to read it. In order to understand the last solution, they may have to look up what the sapply() function does. Personally my favorite is the second solution, which I think has a good balance between being compact and being easy to comprehend.