• Home
  • Readings
  • Github
  • MIES
  • TmVal
  • About
Gene Dan's Blog

No. 96: Project Euler #22 – Three Solutions

14 August, 2013 9:18 PM / Leave a Comment / Gene Dan

I’ve been using R a lot more at work lately, so I have decided to switch languages from VBA to R for my attempts at Project Euler problems. As of today I’ve solved 25 problems, 8 in the last day. I’ve found that R is much more powerful than VBA, especially with respect to handling vectors and arrays via indexing.

Here is the problem as stated:

Using names.txt, (right click and ‘Save Link/Target As…’), a 46K text file containing over five-thousand first names, begin by sorting it into alphabetical order. Then working out the alphabetical value for each name, multiply this value by its alphabetical position in the list to obtain a name score.

For example, when the list is sorted into alphabetical order, COLIN, which is worth 3 + 15 + 12 + 9 + 14 = 53, is the 938th name in the list. So, COLIN would obtain a score of 938 × 53 = 49714.

What is the total of all the name scores in the file?

The problem asks you to download a file containing a very long list of names, sort the names, and then assign each of the names a score based on their character composition and rank within the list. You are then asked to take the sum of all the scores.

The problem requires you to manipulate a long list of names.

The problem requires you to manipulate a long list of names.

Solution 1

My first solution consists of 15 lines. First, I imported the text file via read.csv() and assigned the sorted values to a vector called names.sorted. I then ran a loop iterating over each of the names, applying the following procedure to each one:

  1. Split the name into a vector of characters
  2. Use the built-in dataset LETTERS which is already indexed from 1-26 to assign a numeric score to each letter that appears in the name. The which function is used to match the characters of each name to the index (the value of which is the same as the score) at which it appears in the dataset LETTERS.
  3. Sum the scores assigned to each letter, and then multiply the sum by the name’s numeric rank in names.sorted. Then append this value to a vector y.

After the loop, the function sum(y) takes the sum of all the values in the vector y, which is the answer to the question.
Here’s the code for the first solution:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
names<-read.csv("names2.csv",stringsAsFactors=FALSE,header=FALSE,na.strings="")
names.v<-as.vector(as.matrix(names))
names.sorted <- sort(names.v)
 
y <- 0
z <- 0
for(i in names.sorted){
  x <- 0
  z <- z+1
  for(j in strsplit(i,c())[[1]]){
    x <- append(x,which(LETTERS==j))
  }
  y <- append(y,sum(x)*z)
}
sum(y)

Solution 2

After solving the problem, I decided to write an alternative solution that would reduce the number of variables declared. I used the function sapply to apply a single function over the entire vector names.score:

1
2
3
4
5
6
names<-read.csv("names2.csv",stringsAsFactors=FALSE,header=FALSE,na.strings="")
names.score<-c()
for(i in sort(as.vector(as.matrix(names)))){
  names.score[i] <- sum(sapply(strsplit(i,c())[[1]],function(x) which(LETTERS==x)))
}
sum(names.score*seq(1:length(names.sorted)))

This method allowed me to remove one of the loops and to remove the variables names.v, y and z.  This reduced the number of lines of code from 15 to 6.

Solution 3

I then found out I could further reduce the solution to just 2 lines of code by using nested sapply() functions over the names variable:

1
2
names <-sort(as.vector(as.matrix(read.csv("names2.csv",stringsAsFactors=FALSE,header=FALSE,na.strings=""))))
sum(sapply(names,function(i)sum(sapply(strsplit(i,c())[[1]],function(x) which(LETTERS==x))))*seq(1:length(names)))

Here, I got rid of the names.score variable and only declared a single variable. The nested sapply() functions are used to first iterate over each element of the vector names, and second, to iterate over each character within those elements of the vector. The sum() function is wrapped around the nested sapply() functions which produces the solution by summing the scores of the individual names.

As you can see, R comes with some neat features that great for condensing your code. However, there are some tradeoffs as the first solution is very easy to read, whereas the last solution may be difficult for people to read, especially if they are not familiar with R. Loops are quite easy to spot in most widely used languages, so someone who knows C++ but not R should be able to read it. In order to understand the last solution, they may have to look up what the sapply() function does. Personally my favorite is the second solution, which I think has a good balance between being compact and being easy to comprehend.

Posted in: Uncategorized / Tagged: Names scores, project euler, project euler 22

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Post Navigation

← Previous Post
Next Post →

Archives

  • September 2023
  • February 2023
  • January 2023
  • October 2022
  • March 2022
  • February 2022
  • December 2021
  • July 2020
  • June 2020
  • May 2020
  • May 2019
  • April 2019
  • November 2018
  • September 2018
  • August 2018
  • December 2017
  • July 2017
  • March 2017
  • November 2016
  • December 2014
  • November 2014
  • October 2014
  • August 2014
  • July 2014
  • June 2014
  • February 2014
  • December 2013
  • October 2013
  • August 2013
  • July 2013
  • June 2013
  • March 2013
  • January 2013
  • November 2012
  • October 2012
  • September 2012
  • August 2012
  • July 2012
  • June 2012
  • May 2012
  • April 2012
  • March 2012
  • February 2012
  • January 2012
  • December 2011
  • September 2011
  • August 2011
  • July 2011
  • June 2011
  • January 2011
  • December 2010
  • October 2010
  • September 2010
  • August 2010
  • June 2010
  • May 2010
  • April 2010
  • March 2010
  • September 2009
  • August 2009
  • May 2009
  • December 2008

Categories

  • Actuarial
  • Cycling
  • Logs
  • Mathematics
  • MIES
  • Music
  • Uncategorized

Links

Cyclingnews
Jason Lee
Knitted Together
Megan Turley
Shama Cycles
Shama Cycles Blog
South Central Collegiate Cycling Conference
Texas Bicycle Racing Association
Texbiker.net
Tiffany Chan
USA Cycling
VeloNews

Texas Cycling

Cameron Lindsay
Jacob Dodson
Ken Day
Texas Cycling
Texas Cycling Blog
Whitney Schultz
© Copyright 2025 - Gene Dan's Blog
Infinity Theme by DesignCoral / WordPress