Monthly Archives: August 2013

You are browsing the site archives by month.

No. 99: Eratosthenes’ Sieve

26 August, 2013 8:29 PM / Leave a Comment / Gene Dan

A friend of mine pointed out that I had skipped a few steps in the solution I posted yesterday for Euler #3 – first, I didn’t actually go through the process of finding the primes, and second, I didn’t try to figure out how many prime numbers would be necessary for the list I used to find the largest prime factor of 600,851,475,143. To rectify these issues, I wrote another program that can provide the general solution for any integer larger than 1:

prime <- 2
primes <-c()
remaining.factor <- 600851475143
while(prime != remaining.factor){
  while(remaining.factor %% prime ==0){
    remaining.factor <- remaining.factor/prime
  }
  primes <- append(primes,prime)
  while(length(which(prime %% primes ==0)>0)){
    prime <- prime + 1
  }
}
remaining.factor

prime <- 2

primes <-c()

remaining.factor <- 600851475143

while(prime != remaining.factor){

while(remaining.factor %% prime ==0){

remaining.factor <- remaining.factor/prime

}

primes <- append(primes,prime)

while(length(which(prime %% primes ==0)>0)){

prime <- prime + 1

}

remaining.factor

All you have to do is replace 600,851,475,143 with a number of your choice, and the script above will find the largest prime factor for that number, given enough time and memory. I was actually somewhat lucky that the answer ended up being 6857. Had it been larger, the program might have taken much longer to execute (possibly impractically long if the factor happened to be big enough).

Eratosthenes’ Sieve

Now that I have that out of the way, I would like to demonstrate a method for quickly generating prime numbers, called Eratosthenes’ Sieve. You can embed this algorithm into the solution of any Project Euler problem that calls for a large list of prime numbers under 10 million (after 10 million, the algorithm takes a long time to execute, and other sieves may be a better choice). While it’s possible to generate prime numbers on the fly for Euler problems, I still prefer to use lists to solve them. Below is the algorithm for Eratosthenes’ Sieve that I’ve written in R:

primes <- 2:10000000

curr.prime <- 2
while(curr.prime < sqrt(10000000)){
  primes[(primes >= curr.prime **2) & (primes %% curr.prime == 0)] <- 0
  curr.prime <- min(primes[primes>curr.prime])
}

primes <- primes[primes!=0]

primes <- 2:10000000

curr.prime <- 2

while(curr.prime < sqrt(10000000)){

primes[(primes >= curr.prime **2) & (primes %% curr.prime == 0)] <- 0

curr.prime <- min(primes[primes>curr.prime])

}

primes <- primes[primes!=0]

The script above will generate a list of all the primes below 10,000,000. The algorithm starts with a list of integers from 2 to 10,000,000. Starting with 2, the smallest prime, you remove all the multiples of 2 from the list. Then you move on to the next prime, 3, and remove all the multiples of 3 from the list. The process continues until the only numbers left in the list are prime numbers.

As you can see, all the composite numbers are now marked as zeros. The final line of the script removes these zeros and gives you a list of just the prime numbers.

Posted in: Mathematics / Tagged: eratosthenes sieve, eratosthenes sieve in r, project euler 3, project euler 3 r, R

No. 98: Project Euler #3 – Largest Prime Factor (solution in R)

25 August, 2013 1:15 PM / 1 Comment / Gene Dan

Question:

The prime factors of 13195 are 5, 7, 13 and 29.
What is the largest prime factor of the number 600851475143 ?

Approach:

Every integer greater than one has a unique prime factorization. This means that each integer greater than one can be represented as a unique product of prime numbers. For example, the number 5,784 can be written as:

\[2^3\times 241\]

Where 2 and 241 are prime. Likewise, the number 54,887,224 can be written as:

\[2^3\times 7 \times 53 \times 18493\]

Where 2, 7, 53, and 18493 are prime. For this problem, our goal is to find the largest prime factor of the number 600,851,475,143. An interesting thing to note is that the mathematician F. N. Cole, back in 1903, found the prime factorization of the number 147,573,952,589,676,412,927, which is 193,707,721 × 761,838,257,287. It took him three years worth of effort to do so. While our number is not quite as large as Cole’s, such problems today are trivial. Computer algebra systems such as Mathematica can factorize large numbers very quickly. What took Cole 3 years back in 1903 now takes less than a second today.

We have some discretion in our approach to solving this problem. You can probably use Mathematica to solve it with a single line of code. On the other hand, if you’re using a bare programming language like C++, you would probably need to write more code. Your choice on what to do depends on what you want to learn from solving the problem. I had previously solved this problem using VBA which first involved creating a list of prime numbers, using the list to find the prime factors of 600,851,475,143, and then selecting the largest one as my answer. With R, I’ve taken a middle approach, compromising between arriving at an immediate answer with a computer algebra system and having to create a list of primes from scratch. In this case, I used a list of primes I found off a website and saved them into a file called primes.csv. My mental outline is as follows:

1. Obtain a list of prime numbers
2. Find which of these prime numbers divide 600,851,475,143 evenly.
3. Of the primes that divide 600,851,475,143 evenly, find the largest one.

Solution:

primes <- sort(as.vector(as.matrix(read.csv("primes.csv",header=FALSE))))
max(primes[which(600851475143 %% primes == 0)])

1 2	primes <- sort(as.vector(as.matrix(read.csv("primes.csv",header=FALSE)))) max(primes[which(600851475143 %% primes == 0)])

The first line does five things. First, it imports the list of prime numbers as a data frame, then converts the data frame into a matrix, then converts the matrix into a vector, and then sorts the primes within the vector in ascending order. This vector is then assigned to the variable primes. The second line accomplishes three things. The which() function returns the indices of the elements that divide 600,851,475,143 evenly. These indices are then used to extract the corresponding elements of the vector primes. Finally, the function max() returns the largest of these primes, giving us the answer, 6857.

I thought this answer would avoid the tedium of having to calculate each prime myself, but was not so trivial that I could find the answer without learning anything. I thought this method was a good way to demonstrate the use of indexing to find the solution.

Posted in: Mathematics / Tagged: largest prime factor, prime factorization, project euler, project euler 3, project euler 3 r, solution in r

No. 97: Project Euler #2 – Even Fibonacci Numbers

19 August, 2013 6:54 PM / Leave a Comment / Gene Dan

Question:

Each new term in the Fibonacci sequence is generated by adding the previous two terms. By starting with 1 and 2, the first 10 terms will be:
1, 2, 3, 5, 8, 13, 21, 34, 55, 89, …
By considering the terms in the Fibonacci sequence whose values do not exceed four million, find the sum of the even-valued terms.

Background:

There are some interesting things to note about Fibonnaci. Students often get their first introduction to the Fibonacci numbers when they first learn about sequences. The popular story is that they were used by Fibonacci to model rabbit populations over a period of time – you start with one pair of rabbits, denoted by 1 as the first term of the sequence. After a month, this pair of rabbits produces another pair, and now the total number of pairs is two. After another month, these two pairs produce another pair, resulting in a total of three pairs after two months. After three months, these three pairs produce 2 additional pairs, resulting in a total of five. The sequence then continues indefinitely, and students are often surprised at how quickly the terms grow.

Beyond the Fibonnaci numbers, there are some more interesting things about the mathematician for whom they are named, and how he fits in with the history of mathematics. While the Fibonacci numbers may be the most widespread way in which people are aware of Fibonacci’s existence, they are not (at least in my opinion), his most important contribution to western mathematics. Although the Fibonacci numbers appear in his book, Liber Abaci, the book itself also introduces the Hindu-Arabic number system, and it was this book that played a crucial role in popularizing the Arabic numerals – the numbers we use today – in western society. Prior to that, Europeans relied on Roman Numerals, which were cumbersome when it came to calculation. Even though the Arabic numerals appeared in Liber Abaci in 1202, it took another 400 years until they became ubiquitous throughout Europe, when the invention of the printing press aided in its widespread adoption.

Approach:

This problem is fairly straightforward. The description already tells you how to calculate the Fibonnaci numbers, and simply asks you to find the sum of the even valued terms which are less than four million. With some time and effort, you could easily do the problem by hand, although there is more educational value beyond simple addition (and efficiency!) if you solve it with a computer program. When I start out on these problems, I first create a mental outline of the solution, typically starting with brute force, and if I realize that a brute force solution would take too long, I try to come up with an indirect approach that may be more efficient. In this case the problem is simple, so I decided to use brute force. Here are steps in my mental outline:

Generate the sequence of all Fibonacci numbers that are less than 4,000,000.
Determine which numbers of the sequence are even numbers, then extract this subset.
Add the subset of these even numbers. This will be the solution.

Solution 1:

x <- c(1,2)
k = 3
i = 3
while(k<=4000000){
  x <- append(x,k)
  k = k + x[i-1]
  i = i +1
}
sum(x[x%%2==0])

x <- c(1,2)

k = 3

i = 3

while(k<=4000000){

x <- append(x,k)

k = k + x[i-1]

i = i +1

}

sum(x[x%%2==0])

For this solution, I first created a vector x containing the first two seed values, 1 and 2. This vector will be used to store the sequence of Fibonnaci numbers under 4 million. Next, I declared variables k and i, starting at 3 to be used as indices in the while loop that is used to calculate the terms of the sequence. I used a while loop because I do not know (at the time of writing the solution) the number of terms that fall below 4 million. The while loop allows me to keep appending terms to the vector x so long as the terms are less than four million. Once the value of the next term exceeds four million, the program exits the loop and moves on to the next line after the loop.

Within the loop itself, you see the line x <- append(x,k). This tells the program to append the value of k=3 to the vector x, or in other words, to append the third term of the Fibonacci sequence. The next line, k = k + x[i-1], sets k equal to its current value plus the value of the second to last element of the vector x (denoted by i-1, since there are currently i elements of the vector x). In other words, set k = 5. Then i = i +1 tells the program to increment i by 1, or in other words, set i equal to 4, which will be used to index the vector x during the loop’s next iteration when there will be four elements in the vector x.

During the next iteration of the loop, k is now equal to 5, i is now equal to 4. Append k to x so that the sequence becomes (1,2,3,5). Then set k = 5+3 = 8. Then set i = 4+1 = 5. The loop continues until all the terms of the Fibonacci sequence under 4 million are contained in x.
The last line of the program tells the computer to take the sum of all the even elements of x. The index[x%%2==0] is a predicate telling the computer to extract the subset of x where 2 divides each element evenly. The %% operator is the modulo operator, which returns the remainder of x and 2. When the remainder is equal to 0, that means 2 divides x evenly. The sum over the subset returned is the answer, 4,613,732.

Solution 2:

x <- c(1,2)
k = 3
while(k<=4000000){
  x <- append(x,k)
  k = k + max(x[x<k])
}
sum(x[x%%2==0])

x <- c(1,2)

k = 3

while(k<=4000000){

x <- append(x,k)

k = k + max(x[x<k])

}

sum(x[x%%2==0])

In this solution, I was able to remove the variable i, by noticing that it was sufficient to calculate the next term of the Fibonacci sequence by adding the largest two terms of the vector x. This calculation is represented by the line k = k + max(x[x<k]).

Solution 3:

x <- c(1,2)
while(max(x)<=4000000){
  x <- append(x,max(x)+max(x[x<max(x)]))
}
sum(x[x%%2==0])

x <- c(1,2)

while(max(x)<=4000000){

x <- append(x,max(x)+max(x[x<max(x)]))

}

sum(x[x%%2==0])

Here, I was able to shorten the solution to five lines by removing the variable k. Here, I tell the the program to simply append to x, the sum of the current largest two values of x. This calculation is represented by the line x <- append(x,max(x)+max(x[x<max(x)])).

Posted in: Logs, Mathematics / Tagged: even fibonacci numbers, fibonacci, fibonacci numbers, project euler #2, project euler 2 R, R

No. 96: Project Euler #22 – Three Solutions

14 August, 2013 9:18 PM / Leave a Comment / Gene Dan

I’ve been using R a lot more at work lately, so I have decided to switch languages from VBA to R for my attempts at Project Euler problems. As of today I’ve solved 25 problems, 8 in the last day. I’ve found that R is much more powerful than VBA, especially with respect to handling vectors and arrays via indexing.

Here is the problem as stated:

Using names.txt, (right click and ‘Save Link/Target As…’), a 46K text file containing over five-thousand first names, begin by sorting it into alphabetical order. Then working out the alphabetical value for each name, multiply this value by its alphabetical position in the list to obtain a name score.
For example, when the list is sorted into alphabetical order, COLIN, which is worth 3 + 15 + 12 + 9 + 14 = 53, is the 938th name in the list. So, COLIN would obtain a score of 938 53 = 49714.
What is the total of all the name scores in the file?

The problem asks you to download a file containing a very long list of names, sort the names, and then assign each of the names a score based on their character composition and rank within the list. You are then asked to take the sum of all the scores.

The problem requires you to manipulate a long list of names.

Solution 1

My first solution consists of 15 lines. First, I imported the text file via read.csv() and assigned the sorted values to a vector called names.sorted. I then ran a loop iterating over each of the names, applying the following procedure to each one:

Split the name into a vector of characters
Use the built-in dataset LETTERS which is already indexed from 1-26 to assign a numeric score to each letter that appears in the name. The which function is used to match the characters of each name to the index (the value of which is the same as the score) at which it appears in the dataset LETTERS.
Sum the scores assigned to each letter, and then multiply the sum by the name’s numeric rank in names.sorted. Then append this value to a vector y.

After the loop, the function sum(y) takes the sum of all the values in the vector y, which is the answer to the question.
Here’s the code for the first solution:

names<-read.csv("names2.csv",stringsAsFactors=FALSE,header=FALSE,na.strings="")
names.v<-as.vector(as.matrix(names))
names.sorted <- sort(names.v)

y <- 0
z <- 0
for(i in names.sorted){
  x <- 0
  z <- z+1
  for(j in strsplit(i,c())[[1]]){
    x <- append(x,which(LETTERS==j))
  }
  y <- append(y,sum(x)*z)
}
sum(y)

names<-read.csv("names2.csv",stringsAsFactors=FALSE,header=FALSE,na.strings="")

names.v<-as.vector(as.matrix(names))

names.sorted <- sort(names.v)

y <- 0

z <- 0

for(i in names.sorted){

x <- 0

z <- z+1

for(j in strsplit(i,c())[[1]]){

x <- append(x,which(LETTERS==j))

}

y <- append(y,sum(x)*z)

}

sum(y)

Solution 2

After solving the problem, I decided to write an alternative solution that would reduce the number of variables declared. I used the function sapply to apply a single function over the entire vector names.score:

names<-read.csv("names2.csv",stringsAsFactors=FALSE,header=FALSE,na.strings="")
names.score<-c()
for(i in sort(as.vector(as.matrix(names)))){
  names.score[i] <- sum(sapply(strsplit(i,c())[[1]],function(x) which(LETTERS==x)))
}
sum(names.score*seq(1:length(names.sorted)))

names<-read.csv("names2.csv",stringsAsFactors=FALSE,header=FALSE,na.strings="")

names.score<-c()

for(i in sort(as.vector(as.matrix(names)))){

names.score[i] <- sum(sapply(strsplit(i,c())[[1]],function(x) which(LETTERS==x)))

}

sum(names.score*seq(1:length(names.sorted)))

This method allowed me to remove one of the loops and to remove the variables names.v, y and z. This reduced the number of lines of code from 15 to 6.

Solution 3

I then found out I could further reduce the solution to just 2 lines of code by using nested sapply() functions over the names variable:

names <-sort(as.vector(as.matrix(read.csv("names2.csv",stringsAsFactors=FALSE,header=FALSE,na.strings=""))))
sum(sapply(names,function(i)sum(sapply(strsplit(i,c())[[1]],function(x) which(LETTERS==x))))*seq(1:length(names)))

1 2	names <-sort(as.vector(as.matrix(read.csv("names2.csv",stringsAsFactors=FALSE,header=FALSE,na.strings="")))) sum(sapply(names,function(i)sum(sapply(strsplit(i,c())[[1]],function(x) which(LETTERS==x))))*seq(1:length(names)))

Here, I got rid of the names.score variable and only declared a single variable. The nested sapply() functions are used to first iterate over each element of the vector names, and second, to iterate over each character within those elements of the vector. The sum() function is wrapped around the nested sapply() functions which produces the solution by summing the scores of the individual names.

As you can see, R comes with some neat features that great for condensing your code. However, there are some tradeoffs as the first solution is very easy to read, whereas the last solution may be difficult for people to read, especially if they are not familiar with R. Loops are quite easy to spot in most widely used languages, so someone who knows C++ but not R should be able to read it. In order to understand the last solution, they may have to look up what the sapply() function does. Personally my favorite is the second solution, which I think has a good balance between being compact and being easy to comprehend.

Posted in: Uncategorized / Tagged: Names scores, project euler, project euler 22

No. 95: My Thoughts on Computer Literacy

12 August, 2013 9:12 PM / Leave a Comment / Gene Dan

Last week, I ran across an article on Hacker News. It was written by a school teacher who bemoaned the lack of computer literacy amongst today’s youth. The basic idea is that despite the stereotype that kids these days are a bunch of tech-savvy computer wizards, it is actually the case that the opposite is true – kids can’t really use computers. The author contends that the increasing ease of use of computers, along with the increase in the consumption of entertainment media amongst today’s youths has brought about a corresponding decline in their ability to understand how computers work, and how to use them to solve problems.

I personally thought the blog post was one of the best pieces of news I had read all week, but based on the comments I’ve seen it looks like many people thought otherwise. Most of the posters on Reddit thought the author was a stereotypical, arrogant IT nerd, whereas most of the users on Hacker News were more supportive. In the midst of the author’s bitter ranting, I think many of the readers overlooked what I thought was the most important piece of his article:

Tomorrow’s politicians, civil servants, police officers, teachers, journalists and CEOs are being created today. These people don’t know how to use computers, yet they are going to be creating laws regarding computers, enforcing laws regarding computers, educating the youth about computers, reporting in the media about computers and lobbying politicians about computers.

I think the main flaw of the article is that the author spent too much time ranting about people’s struggles with basic troubleshooting and not enough time expanding on technological illiteracy’s greater impact on society regarding the freedom of speech, censorship, and governmental control. Nevertheless, I encourage anyone who is interested to read the whole thing.

My Thoughts

1. Breaking Stuff

When I was growing up, I was lucky enough to have access to computers at a time when they cost $10k per unit. However, as a child, I never put much thought into what computers could do other than to serve as a station on which I could play video games. I had a vague idea that they could be used for more important things, but I didn’t have any understanding as to what those things were. All I knew was that Dad would leave for work at 5:00 AM in the morning, do things on a computer he used at work and then return home, and as a result of that activity we were able to afford things like food and shelter. What I do remember is that like most kids, I would mess around with stuff at home and computers were no different – I would fiddle around with the settings and configurations, install random software and so on and so forth, which often led to things like system failures, viruses, and so on and so forth. As a result, my Dad would get angry (understandably), spend many hours restoring the machine to its pristine state, and give me a stern warning not to screw it up again.

That tone continued throughout high school and college – my parents provided me with most of what I needed, using their hard-earned money to pay for it. As a result, I was discouraged from experimenting with our possessions (as experimentation usually led to breakage). After college, I started having an interest in computers, and when I became financially independent, one of the first things I did was buy a whole bunch of computer parts and put together a machine by myself. Overall, the process was pretty easy, but I did make some mistakes on my first try – I accidentally destroyed my CPU and broke off some parts of my original motherboard, resulting in me spending hundreds of dollars replacing the parts. However, through these mistakes I became better at assembling computers, and became more careful while handling the parts. Furthermore, since I had my own financial stake in the equipment, I paid more attention to maintenance and I put more effort into repairing things when they stopped working.

I think kids should be encouraged to mess around with their possessions, on the condition that they learn how to repair them if they accidentally break them. This approach requires a bit more financial investment as parts need to be replaced if all options of repair have been exhausted. Because of this, replacement should only be done if it’s an absolute necessity – otherwise if kids knew they’d get everything they break replaced, they wouldn’t try as hard to fix the things they already have. I believe that breaking things – and subsequently repairing things – teaches people the limits of what their possessions can and can’t do, and under appropriate constraints, gives them more control, knowledge, and appreciation of their belongings.

2. The Spread of Ideas

The internet is a wonderful thing. The internet has give us access to information that we would have otherwise had to either spend lots of money, travel great distances, or wait for extraordinarily long periods of time to obtain. The internet has not only greatly reduced the cost of knowledge but also the cost of communication – it is no longer necessary for people of similar interests to meet in a single physical location to share their ideas. For instance, suppose you were an expert in a particular area of mathematics working on a problem that perhaps only you and five other people could understand. Without the internet, you might have never known about these other people’s existence, but now the internet has allowed experts to collaborate with each other and solve problems.

When I was growing up, I was interested in a number of subjects that I simply did not have access to. The local library’s selection was sparse and didn’t contain the appropriate volumes for my reading level. Now that internet use has become ubiquitous, I can find much of the information I want for free, and if I wanted to obtain a copy of a book on an academic subject such as Topology, Analysis, or Computer Science, I can choose from pretty much any book that’s currently in print. Rare books – such as George Carr’s Formulas and Theorems in Pure Mathematics and even the Voynich Manuscript (of which there is only one original copy), are available for free.

What this means is that the internet, driven by computers, not only allows people to acquire knowledge, share ideas, and solve problems incredibly quickly, but also increases opportunity for the common person. For instance, If I wanted to find someone who could build me specialized equipment for an experiment, I could do it pretty easily via the internet – this allows the builder to find a customer who’s willing to pay for his services, and for me to obtain the equipment I need. Easy access to communication makes both parties better off.

3. On Censorship, Freedom of Speech, and Control

Our ability to communicate not only dictates how we are able to share ideas, but also our ability to think. In our society, we are electing officials who will enact laws governing our nation’s telecommunications infrastructure. The level of technological literacy amongst the population not only directly affects the pool of competent candidates from which to choose from, but also our ability to determine which candidates are technologically competent enough to make decisions regarding what we can and cannot do with our computers.

There are many dimensions to this problem. First, a low level of technological literacy means that knowledge – and hence power – resides within the hands of a few people. Concentrated knowledge allows the people who hold this knowledge to take advantage of those who are ignorant. People who are not aware of how far technology permeates society aren’t thinking about how technology can be used to control our thoughts and our actions, or how it could be used to imprison people and deny them a right to a fair trial. Second, a low level of technological literacy means that society cannot accurately judge a leader’s competence regarding technical matters. We could very well, through complete ignorance, elect a leader who cannot make the right decisions when it comes to enacting laws. Third, technological illiteracy hinders our ability to prioritize how we invest our resources into our economy and society. For the reasons stated in (2), a fast and reliable communications network improves the knowledge base of the populace and allows them to take action to better their own lives through the sharing of ideas, along with the exchange of physical goods and services.

And finally, on censorship. Humans communicate with each other via mutually intelligible signals, such as language. Language allows people to transmit ideas across physical distances, and with the aid of encryption, they are able to do so very securely without the fear of interception. The freedom of thought and speech is critical to our survival – it allows us to comprehend how an authority figure might be abusing its power – and when such abuses are discovered, it allows the discoverer to transmit information about such abuses across a communications network and from there on the information will be acquired and digested by the populace. Access to such information, granted through the freedom of communication allows people to collaborate, take action against such abuses and hence improve their own circumstances.

So why does technological literacy play such a crucial role with respect to the freedom of speech? Most people would regard the freedom of speech as being important but they aren’t thinking hard about why they ought to learn about technology in order to defend it. Technological literacy allows us not only understand what technology can and cannot do, but also gives us a realistic picture of the current challenges we face when it comes to things like cybersecurity and regulation. There is a lot of fear being propagated by the media on cyber attacks, malware, hackers, and so on and so forth – and such fear is often used as an excuse to enact laws that curtail our freedom. Censorship forbids us to transmit certain types of information – and hence certain types of thought across cyberspace. It restricts access to information and the sharing of ideas, and from there the sharing of goods and services, along with the ability for people to collaborate with one another for their own good. With censorship, social progress comes to a halt. It is therefore to our benefit that we promote literacy amongst the population to enable ourselves to take action against what we see as unjust, and to protect our freedom to better our own lives.

Posted in: Logs

1 2 Next »

Monthly Archives: August 2013

No. 99: Eratosthenes’ Sieve

No. 98: Project Euler #3 – Largest Prime Factor (solution in R)

No. 97: Project Euler #2 – Even Fibonacci Numbers

No. 96: Project Euler #22 – Three Solutions

No. 95: My Thoughts on Computer Literacy

Post Navigation

Archives

Categories

Links

Texas Cycling