• Home
  • Readings
  • Github
  • MIES
  • TmVal
  • About
Gene Dan's Blog

Monthly Archives: October 2013

You are browsing the site archives by month.

No. 100: Strike Incidents – Visualizing Data with ggplot2

17 October, 2013 7:55 PM / Leave a Comment / Gene Dan

Today, I’d like to demonstrate some of ggplot2‘s plotting capabilities. ggplot2 is a package for R written by Hadley Wickham, who teaches nearby at Rice University. I’ve been using it as my primary tool for visualizing data both at work and in my spare time.

Wildlife Strikes

The Federal Aviation Administration oversees regulations concerning domestic aircraft. One of the interesting things they keep track of are collisions between aircraft and wildlife – the vast majority of which involving birds. This dataset is available for download from the FAA website.

It turns out that bird strikes happen a lot more frequently than I thought. A quick look at the database revealed that 146,607 reported incidents have occured since 1990 (about 11,000 per year), with the number of birds killed ranging anywhere from 1 to over 100 per incident (cases in which an airplane flew through a flock of birds). Although human deaths are infrequent (less than 10 per year), bird strikes cause substantial physical damage at around $400 million per year

ku-bigpic

An example of bird-related damage.

Data Preparation
There are tools available that allow R to communicate with databases via ODBC connectivity. The database concerning strike incidents is in .mdb format which can be accessed with the package RODBC.

The code below loads the package RODBC along with ggplot2 and returns a brief summary of our dataset:

1
2
3
4
library(RODBC)
library(ggplot2)
 
channel

str

As you can see, data in this form are not easy to interpret. R’s visualization tools allow us to transform this data into something meaningful for our audience.

RODBC lets us send SQL queries to the database connection to aggregate and summarize the data. I found MSAccess’ particular SQL implementation to be very awkward, but I was able to alter my code enough to succesfully return the number of incidents by year:

1
2
3
4
5
6
7
8
strikes.yr <- sqlQuery(channel,"SELECT [INCIDENT_YEAR] AS [Year],
                                      COUNT(*) AS [Count]
                                      FROM [StrikeReport]
                                      GROUP BY [INCIDENT_YEAR]
                                      ORDER BY [INCIDENT_YEAR]")
 
strikes.yr <- aggregate(Count~Year,FUN=sum,data=strikes)
strikes.yr

stryr

Next, we use the function ggplot from ggplot2 to create a bar plot of our data:

1
2
3
4
ggplot(strikes.yr,aes(x=Year,y=Count))+
  geom_bar(stat="identity",fill="lightblue",colour="black")+
  theme(axis.text.x = element_text(angle=60, hjust=1,size=rel(1)))+
  ggtitle("Strike Incidents by Year")+theme(plot.title=element_text(size=rel(1.5),vjust=.9))

strike_yr

From the diagram above, we can see that the number of strikes has increased substantially over the last 24 years. However, we can’t make any definite conclusions without more information. Perhaps this increase is due to more aircraft being flown for longer durations. Or maybe it’s due to better/more accurate reporting of incidents. Notice that the bar for 2013 is much lower than the previous year. The most straightforward explanation for this observation is that 2013 isn’t over yet, and not all the incidents that have happened have been reported yet.

We can use ggplot2 to see the proportion of strikes that have been incurred by the military. The fill argument specifies that the bars should be segmented by operator type:

1
2
3
4
5
ggplot(strikes,aes(x=Year,y=Count,fill=OperatorType))+
  geom_bar(stat="identity",colour="black")+
  scale_fill_brewer(palette="Pastel1")+
  theme(axis.text.x = element_text(angle=60, hjust=1,size=rel(1)))+
  ggtitle("Strike Incidents by Year")+theme(plot.title=element_text(size=rel(1.5),vjust=.9))

strike_militry

Now, let’s take a look at which aircraft operators have experienced the most accidents over the last 24 years. The code below sends a query to the database connection that summarizes the incidents by operator:

1
2
3
4
5
6
7
8
9
strikes.oper <- sqlQuery(channel,"SELECT [OPERATOR] AS [Name],
                                      COUNT(*) AS [Count]
                                      FROM [StrikeReport]
                                      GROUP BY [OPERATOR]
                                      ORDER BY COUNT(*) DESC")
 
opr.order <- strikes.oper$Name[order(strikes.oper$Count)]
strikes.oper$Name <- factor(strikes.oper$Name,levels=opr.order)
strikes.oper

oper

In this case, the data were not ordered upon extraction, so the two lines after the SQL query reorder the strikes.oper vector by strike count.

We can now use this data to visualize the top 30 operators with a dot plot:

1
2
3
4
5
ggplot(strikes.oper[2:31,],aes(x=Count,y=Name))+geom_segment(aes(yend=Name), xend=0, colour="grey50") +
  geom_point(size=3.5,colour="#2E64FE")+
  theme_bw() +
  theme(panel.grid.major.y = element_blank())+
  ggtitle("Strike Incidents by Operator, 1990-Present ")+theme(plot.title=element_text(size=rel(1.5),vjust=.9))

strike_oper

It turns out United Airlines happens to have the most incidents, followed by an ambiguous class called “Business” (perhaps private aircraft), the military, and Southwest Airlines. Keep in mind that this says nothing about the frequency of such incidents, only the magnitude. We’ll need more data in order to make inferences about things like safety. It could be the case that United and Southwest are relatively safe airlines, but only have many incidents because they fly the most hours.

So, which animals happen to have the most strike incidents? The code below summaries the data and plots it in a dot plot:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
strikes.spec <- sqlQuery(channel,"SELECT [Species],
                                      COUNT(*) AS [Count]
                                      FROM [StrikeReport]
                                      GROUP BY [SPECIES]
                                      ORDER BY COUNT(*) DESC")
 
spec.order <- strikes.spec$Species[order(strikes.spec$Count)]
strikes.spec$Species <- factor(strikes.spec$Species,levels=spec.order)
 
ggplot(strikes.spec[1:30,],aes(x=Count,y=Species))+geom_segment(aes(yend=Species), xend=0, colour="grey50") +
  geom_point(size=3.5,colour="#FF0000")+
  theme_bw() +
  theme(panel.grid.major.y = element_blank())+
  ggtitle("Strike Incidents by Species, 1990-Present")+theme(plot.title=element_text(size=rel(1.5),vjust=.9))

strike_spec

Most of the time, the type of animal is unidentifiable (go figure), but when it is the most likely case is that it’s a gull or a mourning dove. There are many more species in the database than are depicted in the chart above, which only has the top 30. Below is a partial list of species:

spec

Interestingly, I saw some land animals in the dataset, such as elk, caribou, and aligators (not shown above, but you can see coyotes).

Posted in: Logs, Mathematics / Tagged: bird strike, bird strike R, bird strike statistics, FAA strike wildlife strike database, ggplot2, visualizing data, wildlife strike statistics

Archives

  • September 2023
  • February 2023
  • January 2023
  • October 2022
  • March 2022
  • February 2022
  • December 2021
  • July 2020
  • June 2020
  • May 2020
  • May 2019
  • April 2019
  • November 2018
  • September 2018
  • August 2018
  • December 2017
  • July 2017
  • March 2017
  • November 2016
  • December 2014
  • November 2014
  • October 2014
  • August 2014
  • July 2014
  • June 2014
  • February 2014
  • December 2013
  • October 2013
  • August 2013
  • July 2013
  • June 2013
  • March 2013
  • January 2013
  • November 2012
  • October 2012
  • September 2012
  • August 2012
  • July 2012
  • June 2012
  • May 2012
  • April 2012
  • March 2012
  • February 2012
  • January 2012
  • December 2011
  • September 2011
  • August 2011
  • July 2011
  • June 2011
  • January 2011
  • December 2010
  • October 2010
  • September 2010
  • August 2010
  • June 2010
  • May 2010
  • April 2010
  • March 2010
  • September 2009
  • August 2009
  • May 2009
  • December 2008

Categories

  • Actuarial
  • Cycling
  • Logs
  • Mathematics
  • MIES
  • Music
  • Uncategorized

Links

Cyclingnews
Jason Lee
Knitted Together
Megan Turley
Shama Cycles
Shama Cycles Blog
South Central Collegiate Cycling Conference
Texas Bicycle Racing Association
Texbiker.net
Tiffany Chan
USA Cycling
VeloNews

Texas Cycling

Cameron Lindsay
Jacob Dodson
Ken Day
Texas Cycling
Texas Cycling Blog
Whitney Schultz
© Copyright 2025 - Gene Dan's Blog
Infinity Theme by DesignCoral / WordPress