I've been using R a lot more at work lately, so I have decided to switch languages from VBA to R for my attempts at Project Euler problems. As of today I've solved 25 problems, 8 in the last day. I've found that R is much more powerful than VBA, especially with respect to handling vectors and arrays via indexing.

Here is the problem as stated:

Using names.txt, (right click and 'Save Link/Target As...'), a 46K text file containing over five-thousand first names, begin by sorting it into alphabetical order. Then working out the alphabetical value for each name, multiply this value by its alphabetical position in the list to obtain a name score.

For example, when the list is sorted into alphabetical order, COLIN, which is worth 3 + 15 + 12 + 9 + 14 = 53, is the 938th name in the list. So, COLIN would obtain a score of 938 53 = 49714.

What is the total of all the name scores in the file?

The problem asks you to download a file containing a very long list of names, sort the names, and then assign each of the names a score based on their character composition and rank within the list. You are then asked to take the sum of all the scores.

**Solution 1**

My first solution consists of 15 lines. First, I imported the text file via **read.csv()** and assigned the sorted values to a vector called **names.sorted**. I then ran a loop iterating over each of the names, applying the following procedure to each one:

- Split the name into a vector of characters
- Use the built-in dataset
**LETTERS**which is already indexed from 1-26 to assign a numeric score to each letter that appears in the name. The**which**function is used to match the characters of each name to the index (the value of which is the same as the score) at which it appears in the dataset**LETTERS.** - Sum the scores assigned to each letter, and then multiply the sum by the name's numeric rank in
**names.sorted**. Then append this value to a vector**y**.

After the loop, the function **sum(y)** takes the sum of all the values in the vector **y**, which is the answer to the question.

Here's the code for the first solution:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
names<-read.csv("names2.csv",stringsAsFactors=FALSE,header=FALSE,na.strings="") names.v<-as.vector(as.matrix(names)) names.sorted <- sort(names.v) y <- 0 z <- 0 for(i in names.sorted){ x <- 0 z <- z+1 for(j in strsplit(i,c())[[1]]){ x <- append(x,which(LETTERS==j)) } y <- append(y,sum(x)*z) } sum(y) |

**Solution 2**

After solving the problem, I decided to write an alternative solution that would reduce the number of variables declared. I used the function **sapply** to apply a single function over the entire vector **names.score**:

1 2 3 4 5 6 |
names<-read.csv("names2.csv",stringsAsFactors=FALSE,header=FALSE,na.strings="") names.score<-c() for(i in sort(as.vector(as.matrix(names)))){ names.score[i] <- sum(sapply(strsplit(i,c())[[1]],function(x) which(LETTERS==x))) } sum(names.score*seq(1:length(names.sorted))) |

This method allowed me to remove one of the loops and to remove the variables **names.v**,** ****y** and **z**. This reduced the number of lines of code from 15 to 6.

**Solution 3**

I then found out I could further reduce the solution to just 2 lines of code by using nested **sapply****()** functions over the **names** variable:

1 2 |
names <-sort(as.vector(as.matrix(read.csv("names2.csv",stringsAsFactors=FALSE,header=FALSE,na.strings="")))) sum(sapply(names,function(i)sum(sapply(strsplit(i,c())[[1]],function(x) which(LETTERS==x))))*seq(1:length(names))) |

Here, I got rid of the **names.score** variable and only declared a single variable. The nested **sapply()** functions are used to first iterate over each element of the vector **names**, and second, to iterate over each character within those elements of the vector. The **sum()** function is wrapped around the nested ** sapply()** functions which produces the solution by summing the scores of the individual names.

As you can see, R comes with some neat features that great for condensing your code. However, there are some tradeoffs as the first solution is very easy to read, whereas the last solution may be difficult for people to read, especially if they are not familiar with R. Loops are quite easy to spot in most widely used languages, so someone who knows C++ but not R should be able to read it. In order to understand the last solution, they may have to look up what the **sapply()** function does. Personally my favorite is the second solution, which I think has a good balance between being compact and being easy to comprehend.