Tag Archives: Project Euler

No. 98: Project Euler #3 – Largest Prime Factor (solution in R)

25 August, 2013 1:15 PM / 1 Comment / Gene Dan

Question:

The prime factors of 13195 are 5, 7, 13 and 29.
What is the largest prime factor of the number 600851475143 ?

Approach:

Every integer greater than one has a unique prime factorization. This means that each integer greater than one can be represented as a unique product of prime numbers. For example, the number 5,784 can be written as:

\[2^3\times 241\]

Where 2 and 241 are prime. Likewise, the number 54,887,224 can be written as:

\[2^3\times 7 \times 53 \times 18493\]

Where 2, 7, 53, and 18493 are prime. For this problem, our goal is to find the largest prime factor of the number 600,851,475,143. An interesting thing to note is that the mathematician F. N. Cole, back in 1903, found the prime factorization of the number 147,573,952,589,676,412,927, which is 193,707,721 × 761,838,257,287. It took him three years worth of effort to do so. While our number is not quite as large as Cole’s, such problems today are trivial. Computer algebra systems such as Mathematica can factorize large numbers very quickly. What took Cole 3 years back in 1903 now takes less than a second today.

We have some discretion in our approach to solving this problem. You can probably use Mathematica to solve it with a single line of code. On the other hand, if you’re using a bare programming language like C++, you would probably need to write more code. Your choice on what to do depends on what you want to learn from solving the problem. I had previously solved this problem using VBA which first involved creating a list of prime numbers, using the list to find the prime factors of 600,851,475,143, and then selecting the largest one as my answer. With R, I’ve taken a middle approach, compromising between arriving at an immediate answer with a computer algebra system and having to create a list of primes from scratch. In this case, I used a list of primes I found off a website and saved them into a file called primes.csv. My mental outline is as follows:

1. Obtain a list of prime numbers
2. Find which of these prime numbers divide 600,851,475,143 evenly.
3. Of the primes that divide 600,851,475,143 evenly, find the largest one.

Solution:

primes <- sort(as.vector(as.matrix(read.csv("primes.csv",header=FALSE))))
max(primes[which(600851475143 %% primes == 0)])

1 2	primes <- sort(as.vector(as.matrix(read.csv("primes.csv",header=FALSE)))) max(primes[which(600851475143 %% primes == 0)])

The first line does five things. First, it imports the list of prime numbers as a data frame, then converts the data frame into a matrix, then converts the matrix into a vector, and then sorts the primes within the vector in ascending order. This vector is then assigned to the variable primes. The second line accomplishes three things. The which() function returns the indices of the elements that divide 600,851,475,143 evenly. These indices are then used to extract the corresponding elements of the vector primes. Finally, the function max() returns the largest of these primes, giving us the answer, 6857.

I thought this answer would avoid the tedium of having to calculate each prime myself, but was not so trivial that I could find the answer without learning anything. I thought this method was a good way to demonstrate the use of indexing to find the solution.

Posted in: Mathematics / Tagged: largest prime factor, prime factorization, project euler, project euler 3, project euler 3 r, solution in r

No. 96: Project Euler #22 – Three Solutions

14 August, 2013 9:18 PM / Leave a Comment / Gene Dan

I’ve been using R a lot more at work lately, so I have decided to switch languages from VBA to R for my attempts at Project Euler problems. As of today I’ve solved 25 problems, 8 in the last day. I’ve found that R is much more powerful than VBA, especially with respect to handling vectors and arrays via indexing.

Here is the problem as stated:

Using names.txt, (right click and ‘Save Link/Target As…’), a 46K text file containing over five-thousand first names, begin by sorting it into alphabetical order. Then working out the alphabetical value for each name, multiply this value by its alphabetical position in the list to obtain a name score.
For example, when the list is sorted into alphabetical order, COLIN, which is worth 3 + 15 + 12 + 9 + 14 = 53, is the 938th name in the list. So, COLIN would obtain a score of 938 53 = 49714.
What is the total of all the name scores in the file?

The problem asks you to download a file containing a very long list of names, sort the names, and then assign each of the names a score based on their character composition and rank within the list. You are then asked to take the sum of all the scores.

The problem requires you to manipulate a long list of names.

Solution 1

My first solution consists of 15 lines. First, I imported the text file via read.csv() and assigned the sorted values to a vector called names.sorted. I then ran a loop iterating over each of the names, applying the following procedure to each one:

Split the name into a vector of characters
Use the built-in dataset LETTERS which is already indexed from 1-26 to assign a numeric score to each letter that appears in the name. The which function is used to match the characters of each name to the index (the value of which is the same as the score) at which it appears in the dataset LETTERS.
Sum the scores assigned to each letter, and then multiply the sum by the name’s numeric rank in names.sorted. Then append this value to a vector y.

After the loop, the function sum(y) takes the sum of all the values in the vector y, which is the answer to the question.
Here’s the code for the first solution:

names<-read.csv("names2.csv",stringsAsFactors=FALSE,header=FALSE,na.strings="")
names.v<-as.vector(as.matrix(names))
names.sorted <- sort(names.v)

y <- 0
z <- 0
for(i in names.sorted){
  x <- 0
  z <- z+1
  for(j in strsplit(i,c())[[1]]){
    x <- append(x,which(LETTERS==j))
  }
  y <- append(y,sum(x)*z)
}
sum(y)

names<-read.csv("names2.csv",stringsAsFactors=FALSE,header=FALSE,na.strings="")

names.v<-as.vector(as.matrix(names))

names.sorted <- sort(names.v)

y <- 0

z <- 0

for(i in names.sorted){

x <- 0

z <- z+1

for(j in strsplit(i,c())[[1]]){

x <- append(x,which(LETTERS==j))

}

y <- append(y,sum(x)*z)

}

sum(y)

Solution 2

After solving the problem, I decided to write an alternative solution that would reduce the number of variables declared. I used the function sapply to apply a single function over the entire vector names.score:

names<-read.csv("names2.csv",stringsAsFactors=FALSE,header=FALSE,na.strings="")
names.score<-c()
for(i in sort(as.vector(as.matrix(names)))){
  names.score[i] <- sum(sapply(strsplit(i,c())[[1]],function(x) which(LETTERS==x)))
}
sum(names.score*seq(1:length(names.sorted)))

names<-read.csv("names2.csv",stringsAsFactors=FALSE,header=FALSE,na.strings="")

names.score<-c()

for(i in sort(as.vector(as.matrix(names)))){

names.score[i] <- sum(sapply(strsplit(i,c())[[1]],function(x) which(LETTERS==x)))

}

sum(names.score*seq(1:length(names.sorted)))

This method allowed me to remove one of the loops and to remove the variables names.v, y and z. This reduced the number of lines of code from 15 to 6.

Solution 3

I then found out I could further reduce the solution to just 2 lines of code by using nested sapply() functions over the names variable:

names <-sort(as.vector(as.matrix(read.csv("names2.csv",stringsAsFactors=FALSE,header=FALSE,na.strings=""))))
sum(sapply(names,function(i)sum(sapply(strsplit(i,c())[[1]],function(x) which(LETTERS==x))))*seq(1:length(names)))

1 2	names <-sort(as.vector(as.matrix(read.csv("names2.csv",stringsAsFactors=FALSE,header=FALSE,na.strings="")))) sum(sapply(names,function(i)sum(sapply(strsplit(i,c())[[1]],function(x) which(LETTERS==x))))*seq(1:length(names)))

Here, I got rid of the names.score variable and only declared a single variable. The nested sapply() functions are used to first iterate over each element of the vector names, and second, to iterate over each character within those elements of the vector. The sum() function is wrapped around the nested sapply() functions which produces the solution by summing the scores of the individual names.

As you can see, R comes with some neat features that great for condensing your code. However, there are some tradeoffs as the first solution is very easy to read, whereas the last solution may be difficult for people to read, especially if they are not familiar with R. Loops are quite easy to spot in most widely used languages, so someone who knows C++ but not R should be able to read it. In order to understand the last solution, they may have to look up what the sapply() function does. Personally my favorite is the second solution, which I think has a good balance between being compact and being easy to comprehend.

Posted in: Uncategorized / Tagged: Names scores, project euler, project euler 22

No. 78: Project Euler #1

30 November, 2012 3:45 AM / 1 Comment / Gene Dan

Hey everyone,

I took a few weeks off posting in order to study for Exam MFE, which I took 4 weeks ago. I’m still waiting for the result, so in the meantime I’ve decided to knock out the CAS modules and finish up my SQL course before I have to start studying hard again. Anyway, I’ve finally gotten around to attempting the Project Euler problems that my friends, along with other actuaries, recommended to me as a good way to improve my programming skills.

Basically, Project Euler is a website that hosts mathematical problems that anyone can attempt to solve with whatever method they wish. Most of the problems are simply stated and can be understood by most people who have taken a basic course in number theory. However, they are often very computationally intensive if attempted directly so most people try to solve them by writing computer programs. Oftentimes students will use the site in an iterative fashion to practice their programming skills when picking up a new language. For example, one of my colleagues who has solved 80 problems first used the site to learn VBA, and then did the same problems again in R, and then again in C++.

For my first attempt, I decided to use VBA because I already know most of the basic syntax. Before starting, I decided that I would not seek any outside help or lookup any solutions when trying to solve these problems, because I want to work these out on my own. So please don’t give me any tips! (although I do appreciate any advice for any of my other projects).

The first problem is stated as follows:

If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.
Find the sum of all the multiples of 3 or 5 below 1000.

Solution:

[code language=”vb”]
Sub euler1()

Dim x As Long ‘natural number to be tested
Dim y As Long ‘current sum of the multiples of 5 or 3

y = 0
For x = 1 To 999
If x Mod 3 = 0 Or x Mod 5 = 0 Then
y = y + x
End If
Next x

Debug.Print y

End Sub
[/code]

The final output is 233168. As you can see, the problem is quite easy to understand (and this one was very easy to solve as well) for anyone who passed grade school math. You can probably solve it by sitting at your desk and adding every multiple of 3 or 5 below 1000 by hand, but that would be extremely boring and you probably have better things to do. That’s why you write a program to solve the problem. The solution I wrote above is a simple brute force method (I tested every natural number below 1000 to see if it was evenly divided by 3 or 5) that took less than a second for my computer to solve. The key to understanding the solution is the mod operator, which returns the remainder in a division operation (for example 5 mod 3 equals 2). The statement “If x Mod 3 = 0 Or x Mod 5 = 0” tells the computer to execute the subsequent line, or to add x to the cumulative sum if either x mod 3 or x mod 5 equals 0, in other words, if the remainder after dividing x by 3 or 5 is 0.

So far I’ve solved 17 of the problems, mostly through brute force although I was able to solve one of them by hand. However, as the problems get progressively harder and more computationally intensive, brute force methods become less feasible and more creative thought is required. For example, problem 12 took my computer at least an hour to solve (I’m not sure how long it actually took because I left it on after I went to sleep). This means my solution wasn’t efficient, and that I ought to give it another shot (reduce the amount of iterative code, perhaps).

Posted in: Logs, Mathematics / Tagged: project euler, project euler #1, project euler vba, VBA

Tag Archives: Project Euler

No. 98: Project Euler #3 – Largest Prime Factor (solution in R)

No. 96: Project Euler #22 – Three Solutions

No. 78: Project Euler #1

Archives

Categories

Links

Texas Cycling