Author Archives: Gene Dan

No. 76: Exploring Financial Data with Quantmod

6 October, 2012 1:36 AM / 1 Comment / Gene Dan

Hey everyone,

Things have been extremely busy lately – I feel like the quality of my posts will suffer but I’ll try to put anything I find interesting here. I’m currently studying for my fifth actuarial exam – 3F/MFE – which covers models for financial economics. The exam covers the latter half of McDonald’s Derivatives Markets, which is a graduate textbook covering the basics of financial derivatives – such as forwards and options, and introduces the reader to elementary stochastic processes like random walks and Brownian Motion used to theoretically price these derivatives.

Anyway, some snooping around on the internet for open-source financial software led me to some quantitative finance packages in R – namely Quantmod, which I’ll be discussing here. Unfortunately it seems that (correct me if I’m wrong) the Quantmod project has been abandoned for some time, but its ease of use was my reason to showcase it here. It looks like other packages such as Quantlib have more active involvement – but I currently lack the technical skills/financial knowledge to work with that, but I may revisit it later.

The example above may be a familiar image to some of you – it is a chart depicting the CBOE VIX (volatility index), which measures the implied volatility of S&P 500 index options – in other words – it is an estimate of the short-term volatility of the stock market. Notice that 2008 saw high volatility – this makes sense because the stock market plunged that year. This actually isn’t the highest value seen in the index – to my knowledge that occured on Black Monday in 1987 when the DJIA lost a quarter of its value in less than a day. The above chart was generated using Quantmod with the following code:

[sourcecode language=”r”]
library("quantmod")
getSymbols("^VIX",src="yahoo")
barChart(VIX)
[/sourcecode]

Well, that was simple. Quantmod has a function called getSymbols() that extracts the desired data. The two arguments above (there are more arguments than that – I chose to use the defaults for the rest), specify the ticker symbol to extract (^VIX) and the source of the financial data (yahoo.com). The next function, barchart, plots the data.

<span style="color:#0000ff;">&gt;str(VIX)</span>
An ‘xts’ object from 2007-01-03 to 2012-10-05 containing:
  Data: num [1:1453, 1:6] 12.2 12.4 11.8 12.5 11.9 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:6] "VIX.Open" "VIX.High" "VIX.Low" "VIX.Close" ...
  Indexed by objects of class: [Date] TZ: 
  xts Attributes:  
List of 4
 $ tclass : chr [1:2] "POSIXct" "POSIXt"
 $ tzone  : chr ""
 $ src    : chr "yahoo"
 $ updated: POSIXct[1:1], format: "2012-10-05 20:02:43"

An ‘xts’ object from 2007-01-03 to 2012-10-05 containing:

Data: num [1:1453, 1:6] 12.2 12.4 11.8 12.5 11.9 ...

- attr(*, "dimnames")=List of 2

..$ : NULL

..$ : chr [1:6] "VIX.Open" "VIX.High" "VIX.Low" "VIX.Close" ...

Indexed by objects of class: [Date] TZ:

xts Attributes:

List of 4

$ tclass : chr [1:2] "POSIXct" "POSIXt"

$ tzone : chr ""

$ src : chr "yahoo"

$ updated: POSIXct[1:1], format: "2012-10-05 20:02:43"

The output above tells us that the variable VIX is a time-series object (xts) from January 1st, 2003 to today. The following is the output of the first 10 rows of daily data:

&gt;VIX[1:10]
<span style="color:#000000;">           VIX.Open VIX.High VIX.Low VIX.Close VIX.Volume VIX.Adjusted
2007-01-03    12.16    12.75   11.53     12.04          0        12.04
2007-01-04    12.40    12.42   11.28     11.51          0        11.51
2007-01-05    11.84    12.25   11.68     12.14          0        12.14
2007-01-08    12.48    12.83   11.78     12.00          0        12.00
2007-01-09    11.86    12.47   11.69     11.91          0        11.91
2007-01-10    12.34    12.50   11.43     11.47          0        11.47
2007-01-11    11.42    11.48   10.50     10.87          0        10.87
2007-01-12    10.93    10.93   10.14     10.15          0        10.15
2007-01-16    10.64    10.89   10.40     10.74          0        10.74
2007-01-17    10.90    10.90   10.35     10.59          0        10.59
</span>

>VIX[1:10]

<span style="color:#000000;"> VIX.Open VIX.High VIX.Low VIX.Close VIX.Volume VIX.Adjusted

2007-01-03 12.16 12.75 11.53 12.04 0 12.04

2007-01-04 12.40 12.42 11.28 11.51 0 11.51

2007-01-05 11.84 12.25 11.68 12.14 0 12.14

2007-01-08 12.48 12.83 11.78 12.00 0 12.00

2007-01-09 11.86 12.47 11.69 11.91 0 11.91

2007-01-10 12.34 12.50 11.43 11.47 0 11.47

2007-01-11 11.42 11.48 10.50 10.87 0 10.87

2007-01-12 10.93 10.93 10.14 10.15 0 10.15

2007-01-16 10.64 10.89 10.40 10.74 0 10.74

2007-01-17 10.90 10.90 10.35 10.59 0 10.59

</span>

Here’s an example of a candle chart covering the last 90 days of the index:

A candle chart allows you to visualize the daily spread in addition to the trend over a longer period of time. In the above chart, the colored rectangles are bounded by the open/close values of the day, and the endpoints of the vertical line segments represent the hi/low values of the day.

The function used to generate this chart is:

[sourcecode language=”r”]
candleChart(VIX,multi.col=TRUE,theme="white",subset="last 90 days")
[/sourcecode]

The argument “subset” specifies the slice of time to be plotted in the chart. It’s nice because it accepts natural language – all I had to do was type in “last 90 days” to specify the time interval. Anyway that’s it for now – In the meantime I’ll be spending up to 4-5 hours a day preparing for my test at the beginning of next month.

Posted in: Logs, Mathematics

No. 75: A Training Update

11 September, 2012 1:30 AM / 1 Comment / Gene Dan

Hey everyone,

I feel like my training has progressed well over the last couple of months. Two weeks ago, I wrote that I would try to increase the time at which I could maintain 160-165 bpm to 60 minutes (2 x 30 minute intervals). I managed to complete this goal last Thursday without too much difficulty, though I was getting tired during the last 10 minutes of the second interval:

You can see that there were a couple of blips where I lost focus and my HR dropped below 160, but in the end, the intervals averaged out to 163 bpm and 164 bpm, and I’m satisfied with the way things turned out. This week is a transition week for me, so I’ll still be riding, but the workouts will be unstructured and “fun” to allow me to rest my legs and clear my mind before the next phase of training. The week after that, I plan to conduct a lactate threshold (the heart rate at which the activity becomes anaerobic) test, which involves doing a 30 minute TT, with the average HR of the last 20 minutes being the approximation of LT. I don’t have any big expectations for this test, but I think it will be at least 164 bpm because I know I can do that given the picture above. The first of the two intervals I did was pretty easy, and not the lung-busting effort that you’d produce during a TT. However, I haven’t done a time trial since Fayetteville, so I might not be able to get my HR that high, but I think 168 bpm would be a reasonable expectation or even maybe 170 bpm if I can get it that high.

One of the important things about LT is that cycling involves several different types of endurance: aerobic endurance, anaerobic endurance, and muscular endurance (there might be more to the list). Throughout the offseason, I’ll need to isolate these different types of endurance and create workouts to develop each one. LT is the point at which physical activity becomes anaerobic and lactic acid begins accumulating in the muscles. At this point, the body can only hold so much lactic acid before the muscles fail – and it’s crucial in bicycle racing to out-endure your opponents at critical moments during a race, such as in the final moments of a breakaway. I noticed this year that I had trouble staying with the pack during hard efforts, and this is one of the things I would like to work on before returning to racing. Therefore, in order to tailor workouts to fit my goals, I’ll need to know my LT in order to separate workouts (or parts of workouts) into aerobic/anaerobic segments.

I don’t have any plans to do anaerobic exercises yet, because it’s important to build a good amount of basic aerobic endurance during the offseason, so you’ll have enough endurance to work on high-intensity anaerobic workouts later on. Last year I didn’t ride much in the winter, and I think one of the reasons I ran into overtraining problems in the Spring was that I hadn’t built up enough endurance in the offseason. Anyway, for the next month my goal is to increase the amount of time I can ride non-stop at 160+ bpm (or whatever turns out to be right under LT). I’ll keep the total amount of time above 160+ bpm, but I’ll gradually shift the distribution of time of the 2 intervals more towards the 2nd interval. For the next 4 weeks (2 interval workouts per week):

LT Test: 1 x 30 min
1 x 20 min and 1 x 20 min

1 x 20 min and 1 x 25 min
1 x 25 min and 1 x 25 min

1 x 25 min and 1 x 30 min
1 x 30 min and 1 x 30 min

1 x 25 min and 1 x 35 min
1 x 20 min and 1 x 40 min

(Transition week + 2nd LT Test)

I would eventually like to be able to ride nonstop right below my LT for an hour. I think realistically that will be possible at the end of November. I’m planning to do long weeks for base training in November and December (10-11 hours/week), which aren’t long by competitive standards, but they’re longer than what I did last year and what I believe to be a reasonable goal for the offseason.

Weights

Weight training has continued as usual:

Squat: 5×5 195 lb
Deadlift: 1×5 185 lb
Bench: 5×5 130 lb
Overhead press: 5×5 80 lb
Weighted Dips: 5×5 40 lb
Barbell Row: 5×5 125 lb
Weighted Pullups: 5×5 2.5 lb
Power Clean 3×5 90 lb

I’m planning to do most of my gains this Winter and next Summer when racing is light. I realized this year that it’s really hard to make gains during Spring when all the races are going on.

Posted in: Cycling, Logs

No. 74: Population Decline in Japan – Visualizing Data by Prefecture Using CRAN

3 September, 2012 11:37 PM / 1 Comment / Gene Dan

Oh geez…I hope I got the labels right

Hey everyone,

Last week, I wrote about creating map graphics with R, using Chinese GDP per capita as an example. This week I’m writing about same thing, except this time the data is on the population of Japan. I made a few minor changes to the way I created the animated .gif files. Instead of using the first websites I could find, I instead installed GIMP, which is a free graphics editor similar to Photoshop. It gave me a lot more control over the size of the images, along with the frame rate. So, if you click on any of the images in this post, you’ll notice that the image quality is a little higher.

Population Decline in Japan

The aging of Japan has been an important issue for several decades now, with population set to decrease at an alarming rate over the next few decades. Census data show that the nationwide population peaked some time within the last 5 years, and has already begun to decline, and is projected to decline by up to 50% by the end of the century, should current trends continue. Now why shouldn’t that be a good thing? Isn’t Japan (along with the rest of the world) overpopulated as is? I think over the long term, should the population stabilize at a lower level, we would see more sustainability, and less depletion of natural resources. However, the transition to such a state isn’t pretty. If you look more closely at the census data, you’ll see that a dramatic shift in the age distribution has occurred over the last few decades.

In the image below, you’ll see some Excel files that I extracted from the Japanese Statistics Bureau depicting the median age by prefecture:

As you can see, most of the spreadsheet is in Japanese, but the prefectures were ordered in ISO_3166-2 territory codes, which enabled me to figure out which prefecture was which. You can view the complete dataset here if you have Google Docs. From this table you’ll find that the Median age has more than doubled in several prefectures. I extracted the figures into a .csv file and imported it into RStudio to generate the images used to create the .gif in the next image. The image below is an animated figure depicting the shift in Median age since 1920 (click for full resolution):

Median Age by Prefecture, 1920 – 2005
(Click on Image for Full Resolution)

The above image was generated using the code below (click ‘show source’ to show source):

[sourcecode language=”r” toolbar=”TRUE” wraplines=”FALSE” collapse=”TRUE”]
setwd("./Japan/")
library(maptools)
library(RColorBrewer)
##substitute your shapefiles here
state.map <- readShapeSpatial("JPN_adm0.shp")
counties.map <- readShapeSpatial("JPN_adm1.shp")
## these are the variables we will be plotting
jpnmedage <- read.csv("JpnMedAge1920-2005.csv",header=TRUE)
str(jpnmedage)
counties.map@data$x1920 <- jpnmedage$X1920
counties.map@data$x1930 <- jpnmedage$X1930
counties.map@data$x1940 <- jpnmedage$X1940
counties.map@data$x1950 <- jpnmedage$X1950
counties.map@data$x1960 <- jpnmedage$X1960
counties.map@data$x1970 <- jpnmedage$X1970
counties.map@data$x1980 <- jpnmedage$X1980
counties.map@data$x1990 <- jpnmedage$X1990
counties.map@data$x2000 <- jpnmedage$X2000
counties.map@data$x2005 <-jpnmedage$X2005
## put the lab point x y locations of the zip codes in the data frame for easy retrieval
labelpos <- data.frame(do.call(rbind, lapply(counties.map@polygons, function(x) x@labpt)))
names(labelpos) <- c("x","y")
counties.map@data <- data.frame(counties.map@data, labelpos)

plot.heat <- function(counties.map,state.map,z,title=NULL,breaks=NULL,reverse=FALSE,cex.legend=1,bw=.2,col.vec=NULL,plot.legend=TRUE) {
##Break down the value variable
if (is.null(breaks)) {
breaks=
seq(
floor(min(counties.map@data[,z],na.rm=TRUE)*10)/10
,
ceiling(max(counties.map@data[,z],na.rm=TRUE)*10)/10
,.1)
}
counties.map@data$zCat <- cut(counties.map@data[,z],breaks,include.lowest=TRUE)
cutpoints <- levels(counties.map@data$zCat)
if (is.null(col.vec)) col.vec <- heat.colors(length(levels(counties.map@data$zCat)))
if (reverse) {
cutpointsColors <- rev(col.vec)
} else {
cutpointsColors <- col.vec
}
levels(counties.map@data$zCat) <- cutpointsColors
plot(counties.map,border=gray(.8), lwd=bw,axes = FALSE, las = 1,col=as.character(counties.map@data$zCat))
if (!is.null(state.map)) {
plot(state.map,add=TRUE,lwd=1)
}
##Edit the legend information here
if (plot.legend) legend("bottomleft",inset=c(0.03,0),c("20-25","25-30","30-35","35-40","40-45","45+"), fill = cutpointsColors,bty="n",title=title,cex=cex.legend)
##title("Cartogram")
}

plot.heat(counties.map,state.map,z="x1920",breaks=c(20,25,30,35,40,45,Inf),col.vec=brewer.pal(6,"Purples"),plot.legend=TRUE,reverse=FALSE,title="Median Age")
title(main="Japan: Median Age by Prefecture, 1920")

plot.heat(counties.map,state.map,z="x1930",breaks=c(20,25,30,35,40,45,Inf),col.vec=brewer.pal(6,"Purples"),plot.legend=TRUE,reverse=FALSE,title="Median Age")
title(main="Japan: Median Age by Prefecture, 1930")

plot.heat(counties.map,state.map,z="x1940",breaks=c(20,25,30,35,40,45,Inf),col.vec=brewer.pal(6,"Purples"),plot.legend=TRUE,reverse=FALSE,title="Median Age")
title(main="Japan: Median Age by Prefecture, 1940")

plot.heat(counties.map,state.map,z="x1950",breaks=c(20,25,30,35,40,45,Inf),col.vec=brewer.pal(6,"Purples"),plot.legend=TRUE,reverse=FALSE,title="Median Age")
title(main="Japan: Median Age by Prefecture, 1950")

plot.heat(counties.map,state.map,z="x1960",breaks=c(20,25,30,35,40,45,Inf),col.vec=brewer.pal(6,"Purples"),plot.legend=TRUE,reverse=FALSE,title="Median Age")
title(main="Japan: Median Age by Prefecture, 1960")

plot.heat(counties.map,state.map,z="x1970",breaks=c(20,25,30,35,40,45,Inf),col.vec=brewer.pal(6,"Purples"),plot.legend=TRUE,reverse=FALSE,title="Median Age")
title(main="Japan: Median Age by Prefecture, 1970")

plot.heat(counties.map,state.map,z="x1980",breaks=c(20,25,30,35,40,45,Inf),col.vec=brewer.pal(6,"Purples"),plot.legend=TRUE,reverse=FALSE,title="Median Age")
title(main="Japan: Median Age by Prefecture, 1980")

plot.heat(counties.map,state.map,z="x1990",breaks=c(20,25,30,35,40,45,Inf),col.vec=brewer.pal(6,"Purples"),plot.legend=TRUE,reverse=FALSE,title="Median Age")
title(main="Japan: Median Age by Prefecture, 1990")

plot.heat(counties.map,state.map,z="x2000",breaks=c(20,25,30,35,40,45,Inf),col.vec=brewer.pal(6,"Purples"),plot.legend=TRUE,reverse=FALSE,title="Median Age")
title(main="Japan: Median Age by Prefecture, 2000")

plot.heat(counties.map,state.map,z="x2005",breaks=c(20,25,30,35,40,45,Inf),col.vec=brewer.pal(6,"Purples"),plot.legend=TRUE,reverse=FALSE,title="Median Age")
title(main="Japan: Median Age by Prefecture, 2005")

## plot text
# with(counties.map@data[c(-2,-10,-25,-28,-32),], text(x,y,NAME_1,cex=.7,font=2))
# with(counties.map@data[2,],text(x,y+1.1,NAME_1,cex=.7,font=2))
# with(counties.map@data[28,],text(x+1.7,y+.7,NAME_1,cex=.7,font=2))
# with(counties.map@data[10,],text(x-.5,y-1,NAME_1,cex=.7,font=2))
# with(counties.map@data[25,],text(x+3.3,y,NAME_1,cex=.7,font=2))
# with(counties.map@data[32,],text(x-.7,y,NAME_1,cex=.7,font=2))
[/sourcecode]

To get a better idea of what that data meant, I should have included a graphic of the ratio of working to retired population (and maybe birth rates as well), but I didn’t have the time. Anyway, when the population of Japan declines, it won’t decline uniformly across the age groups. As the population declines, the ratio of old to young people will increase dramatically, putting a strain on younger workers to care for the old. This will result in a labor shortage, and possibly a pensions crisis as the income of the younger workers won’t be enough to fund the pensions of the retired population. Japan also has a very high proportion of debt to GDP – about 200% (twice the ratio of the United States’) – and I suspect the dual strains of having to care for the older workers and make interest payments on the debt will leave the next generation of Japanese without enough funds to provide for their own children, leading to a further decline in birthrate, and even fewer people to care for this generation when they retire – it’s a vicious cycle.

For my next example, I calculated 5-year population growth rates by prefecture using the census data below, also obtained from the Japanese Statistics Bureau:

You can view the full dataset here with Google Docs. I used R code similar to the code shown previously to generate the animated image below (click on the picture for full resolution), which shows the change in growth rate by prefecture over time:

Projected Population Change, 1925-2025
(Click on Image for Full Resolution)

The graph below shows the national annual growth rate from 1873 to 2009:

In the above graph and the animated picture above, you’ll see the dramatic shift in growth rate during the WW2 years.

Final Notes

It’s clear to me that Japan will have to increase immigration or implement policies to stabilize the population. The nation will also have to implement policy reform to get its debt under control so as to not cripple the economy in the next few decades.

In this post I’ve also shown some of the other capabilities of the package RColorBrewer – you can see in the very top picture, along with the last map, that I used divergent color palettes instead of a continuous one. That was especially helpful in depicting negative growth rates.

Posted in: Mathematics / Tagged: japan median age by prefecture, map graphics with R, populatin of japan by prefecture, population decline in Japan, population of japan

No. 73: Geographical Mapping Graphics using CRAN – China GDP per Capita by Province

30 August, 2012 1:24 AM / 6 Comments / Gene Dan

Hey everyone,

A couple days back I learned a new technique with CRAN for visualizing geographic data via mapping. This is something I’ve always wanted to do, and I’m glad I’ve received a project where this skill would be of some use (what you see here is not the actual project, fyi). Go ahead and click on the picture above. This is something that I created using a combination of Excel, CRAN, data from China NBS, and shapefiles from GADM and the U.S. Census Bureau. Basically, I gathered statistical data from China’s National Bureau of Statistics, manipulated it in Excel, linked it to information contained in the cartographic boundary files from GADM, and then composed the images you see here using a programming language called CRAN. What you see above is a heatmap of 2011 GDP per Capita by province, with the shades of color linked to GDP/Capita. The darker shades of blue represent regions with higher nominal average wealth and the lighter shades represent regions with a lower nominal average wealth. You can see that in this country, wealth is concentrated on the east coast in large metropolitan areas (Like Beijing, Tianjin, Shanghai, Hong Kong, and Ordos City), as well as in Inner Mongolia (represented as Nei Mongol on the map).

Data Manipulation in Excel

For this project, I extracted historical GDP per capita from China NBS using Excel by directly linking the cells to the website. My main task in this step was to reconcile the listing of the administrative regions from the cartographic boundary files with those listed from China NBS. I found that there were 5 differences in nomenclature, namely that Inner Mongolia, Ningxia, Xinjiang, and Tibet from China NBS were called Nei Mongol, Ningxia Hui, Xinjiang Uygur, and Xizang in the shapefiles, respectively. Furthermore, I couldn’t find the region listed as “Paracel Islands” from China NBS. In this case there were a small enough number of mismatches for me to correct them all manually.

I ran into another problem when I learned that Hainan used to be a part of Guangdong, and that Chongqing used to be part of Sichuan. This gave me headaches when I was trying to figure out how to divide the provinces into octiles for the graphics. I used the formula that you see in the above picture, which for the most part, kept the separated provinces together (and in the same quantile!) for older years. I did however, have to adjust one of the data points manually. I then created modified vectors to export into RStudio via csv, and in this case each vector had a nice, convenient set of 32 points.

Graphic Composition

My next task involved importing the data into RStudio, and then using CRAN code to generate the graphics. I looked around on Stack Overflow and found threads here and here, proposing ideas on how to link a numeric data vector (in my case, GDP per Capita for each province) to regions within an image. Neither of these proposals worked that well for me, so I had to combine the two and then make some modifications to the code for my purposes. Here’s the end result (if collapsed, click “show source” to see the code):

[code language=”r” toolbar=”TRUE” wraplines=”FALSE” collapse=”FALSE”]
setwd(“./China/”)
library(maptools)
library(RColorBrewer)
##substitute your shapefiles here
state.map <- readShapeSpatial(“CHN_adm0.shp”)
counties.map <- readShapeSpatial(“CHN_adm1.shp”)
## these are the variables we will be plotting
chinagdp <- read.csv(“chinagdp.csv”,header=TRUE)
counties.map@data$x1978 <- chinagdp$X1978M
counties.map@data$x1980 <- chinagdp$X1980M
counties.map@data$x1985 <- chinagdp$X1985M
counties.map@data$x1990 <- chinagdp$X1990M
counties.map@data$x1995 <- chinagdp$X1995M
counties.map@data$x2000 <- chinagdp$X2000
counties.map@data$x2005 <- chinagdp$X2005
counties.map@data$x2010 <- chinagdp$X2010

## put the lab point x y locations of the zip codes in the data frame for easy retrieval
labelpos <- data.frame(do.call(rbind, lapply(counties.map@polygons, function(x) x@labpt)))
names(labelpos) <- c(“x”,”y”)
counties.map@data <- data.frame(counties.map@data, labelpos)

plot.heat <- function(counties.map,state.map,z,title=NULL,breaks=NULL,reverse=FALSE,cex.legend=1,bw=.2,col.vec=NULL,plot.legend=TRUE) {
##Break down the value variable
if (is.null(breaks)) {
breaks=
seq(
floor(min(counties.map@data[,z],na.rm=TRUE)*10)/10
,
ceiling(max(counties.map@data[,z],na.rm=TRUE)*10)/10
,.1)
}
counties.map@data$zCat <- cut(counties.map@data[,z],breaks,include.lowest=TRUE)
cutpoints <- levels(counties.map@data$zCat)
if (is.null(col.vec)) col.vec <- heat.colors(length(levels(counties.map@data$zCat)))
if (reverse) {
cutpointsColors <- rev(col.vec)
} else {
cutpointsColors <- col.vec
}
levels(counties.map@data$zCat) <- cutpointsColors
plot(counties.map,border=gray(.8), lwd=bw,axes = FALSE, las = 1,col=as.character(counties.map@data$zCat))
if (!is.null(state.map)) {
plot(state.map,add=TRUE,lwd=1)
}
##Edit the legend information here
if (plot.legend) legend(“bottomleft”,c(“1st”,”2nd”,”3rd”,”4th”,”5th”,”6th”,”7th”,”8th”), fill = rev(cutpointsColors),bty=”n”,title=title,cex=cex.legend)
##title(“Cartogram”)
}

# plot.heat(counties.map,state.map,z=”x1978″,breaks=c(0,4,8,12,16,20,24,28,32),col.vec=brewer.pal(8,”Blues”),plot.legend=TRUE,reverse=TRUE,title=”Octile n (8th is the highest)”)
# title(main=”China: GDP per Capita by Administrative Region, 1978″)

# plot.heat(counties.map,state.map,z=”x1980″,breaks=c(0,4,8,12,16,20,24,28,32),col.vec=brewer.pal(8,”Blues”),plot.legend=TRUE,reverse=TRUE,title=”Octile n (8th is the highest)”)
# title(main=”China: GDP per Capita by Administrative Region, 1980″)

# plot.heat(counties.map,state.map,z=”x1985″,breaks=c(0,4,8,12,16,20,24,28,32),col.vec=brewer.pal(8,”Blues”),plot.legend=TRUE,reverse=TRUE,title=”Octile n (8th is the highest)”)
# title(main=”China: GDP per Capita by Administrative Region, 1985″)

# plot.heat(counties.map,state.map,z=”x1990″,breaks=c(0,4,8,12,16,20,24,28,32),col.vec=brewer.pal(8,”Blues”),plot.legend=TRUE,reverse=TRUE,title=”Octile n (8th is the highest)”)
# title(main=”China: GDP per Capita by Administrative Region, 1990″)

# plot.heat(counties.map,state.map,z=”x1995″,breaks=c(0,4,8,12,16,20,24,28,32),col.vec=brewer.pal(8,”Blues”),plot.legend=TRUE,reverse=TRUE,title=”Octile n (8th is the highest)”)
# title(main=”China: GDP per Capita by Administrative Region, 1995″)

# plot.heat(counties.map,state.map,z=”x2000″,breaks=c(0,4,8,12,16,20,24,28,32),col.vec=brewer.pal(8,”Blues”),plot.legend=TRUE,reverse=TRUE,title=”Octile n (8th is the highest)”)
# title(main=”China: GDP per Capita by Administrative Region, 2000″)

plot.heat(counties.map,state.map,z=”x2005″,breaks=c(0,4,8,12,16,20,24,28,32),col.vec=brewer.pal(8,”Blues”),plot.legend=TRUE,reverse=TRUE,title=”Octile n (8th is the highest)”)
title(main=”China: GDP per Capita by Administrative Region, 2005″)

# plot.heat(counties.map,state.map,z=”x2010″,breaks=c(0,4,8,12,16,20,24,28,32),col.vec=brewer.pal(8,”Blues”),plot.legend=TRUE,reverse=TRUE,title=”Octile n (8th is the highest)”)
# title(main=”China: GDP per Capita by Administrative Region, 2010″)

## plot text
with(counties.map@data[c(-2,-10,-25,-28,-32),], text(x,y,NAME_1,cex=.7,font=2))
with(counties.map@data[2,],text(x,y+1.1,NAME_1,cex=.7,font=2))
with(counties.map@data[28,],text(x+1.7,y+.7,NAME_1,cex=.7,font=2))
with(counties.map@data[10,],text(x-.5,y-1,NAME_1,cex=.7,font=2))
with(counties.map@data[25,],text(x+3.3,y,NAME_1,cex=.7,font=2))
with(counties.map@data[32,],text(x-.7,y,NAME_1,cex=.7,font=2))
[/code]

When I first created the graphics, the only color option I had was red because the function heat.colors() only returned red colors. To fix that, I loaded the package RColorBrewer which allowed me to choose contiguous colors of any “typical” shade, like red, blue, green, etc. When I tried to run the code in the first Stack Overflow link, the script halted at the function plot.heat(), and it took me a while to figure out that it wouldn’t run because I didn’t have the function defined. Luckily, the second Stack Overflow link had the function defined (but at the same time, it didn’t have any code for labels). I thus combined the two to get the features I wanted.

I then changed the source files (the shapefiles along with the manipulated data from Excel), and assigned vectors for each year of data to separate variables. Some of the other changes I made were within the plot.heat() function itself (lines 23-49), mostly dealing with aesthetic features like the legend format. After doing this, I ran the code a couple times more and noticed that the labels were all bunched up together, so I decided to manually change the label coordinates (lines 78-83) for regions that were located closely together.

You’ll notice that I commented out a lot of plots (lines 52-74), which keeps the code from executing – in this case I created the plots one at a time by commenting/uncommenting these blocks of code. I should make some further improvements later to get all the plots created as a batch.

Result

You can see from the above that in recent decades, much of the wealth has shifted eastward. Most strikingly, Tibet and Qinghai were once the richest regions of China, and have since then become some of the poorest (see the earlier image from Excel for the actual rankings. Since GDP per Capita is just a single variable, I don’t want to make any hasty conclusions about income inequality amongst the provinces yet, as I’ll have to take a deeper look into the data for further analysis. What’s cool about this procedure is that it can be used with several types of data by just changing the vectors in the source code. I combined all of the images and created an animated .gif, which allows you to see the how the distribution of wealth has changed over time:

That’s it for today. I hope you enjoyed it!

Posted in: Mathematics / Tagged: china gdp per capita by province, geographic mapping graphics, mapping graphics, mapping graphics from CRAN, mapping graphics with r, plot.heat, plot.heat r, RColorBrewer, stack overflow

No 72: A Few Updates

28 August, 2012 3:03 AM / Leave a Comment / Gene Dan

Hey everyone,

I’ve been very busy lately, but here are a few things that I’ve been doing/working on:

MySQL

I started working with data stored in SQL Server, which was tough at first because the only way (that I know of) to extract data directly is to write queries using T-SQL (Transact-SQL, Microsoft’s variant of SQL). There’s been a lot of back-and-forth communication between me and another IT team across the country, and they mainly communicate with me using T-SQL, so I figured that I would have to learn the language – not just to understand it but to also use it independently in my own work. I started reading SQL Server Bible to get a better idea of the language, but that book involved a lot of database and IT administration, which made it very hard for me to understand. Thus, I decided to take a step back and read a more basic book on databases, but I was torn between reading Modern Database Administration and Learning SQL.

It’s been a stagnant 2 months for me and my technical skills haven’t progressed much (actually I take that back – I did improve quite a bit because I had to use MSAcess quite a bit to manipulate data for reports), though I did reinforce them through the use of VBA and CRAN for my existing projects. Last week, I finally decided to read Learning SQL because it’s not so hard that I’m completely lost reading it, but it’s challenging enough for me to learn something. The book covers the MySQL variant of SQL, which is nice because it’s open source, so I don’t have to pay $8,000 to use it (like I would with SQL Server). The author says that all the SQL code should work in SQL Server and other platforms, so I don’t have to worry about incompatibility.

Fitness

It took me a long time to get back in shape, but I’ve finally gotten around to riding somewhat consistently over the last month. I started out with easy miles just to get used to riding again, and then recently I’ve been doing some long intervals right under lactate threshold (or what I think to be my lactate threshold) at 160-170 bpm. I started doing 2×5 minute intervals and I gradually added on five minutes to each interval over the last month, and yesterday I did 2×20 minute intervals at around 165 bpm:

As you can see, I still have a lot of work to do. A few months ago I was able to generate 230-250 watts of power at the same heart rate – but if you look closely you can see that I can maintain a heart rate above 160 bpm for a longer period of time than I could in March, so it looks like I can do some things that I couldn’t do before, but at the same time, I’ve lost some of my power output. I’m not too concerned about that because I’m planning to start racing again at the start of next season, and I can worry about high-intensity efforts later.

I’ve been reading Joe Friel’s The Cyclist’s Training Bible and in that book he stresses the importance of improving your weaknesses during training. I know that I have a lot of weaknesses, and I’ve decided to work on them one at a time throughout the offseason and next year. Building a base – or aerobic endurance is what I believe to be my most important weakness, so I’ve decided to keep increasing my interval times until I can hold 160 bpm for 60 minutes (2×30 minute intervals), and that will take me about two weeks. As soon as I’m done with that I’ll conduct a lactate threshold test, and then take a recovery week and work out a training schedule from there.

Posted in: Cycling, Logs

« Previous 1 … 14 15 16 17 18 … 30 Next »

Author Archives: Gene Dan

No. 76: Exploring Financial Data with Quantmod

No. 75: A Training Update

No. 74: Population Decline in Japan – Visualizing Data by Prefecture Using CRAN

No. 73: Geographical Mapping Graphics using CRAN – China GDP per Capita by Province

No 72: A Few Updates

Post Navigation

Archives

Categories

Links

Texas Cycling