For those of you who don’t know what Bitcoin is, it’s a digital currency that’s been gaining attention over the last few years, mostly due to its obscure user base, popularity on the black market (although most bitcoin transactions are legal), and its exchange rate volatility versus the U.S. dollar.
I’ve been interested in Bitcoin for quite some time, since unlike cash transactions, all bitcoin transactions are recorded on a publicly available ledger called the Blockchain. Because the blockchain records all transactions that occur over the bitcoin network, it can be a valuable source of information, revealing interesting patterns about peer-to-peer monetary transactions that were previously unavailable under traditional currency, due to lack of available data.
I stumbled across some CSV files on the internet that contain parsed blockchain information available in a script-friendly format here. Using this dataset I wrote a script to extract the transactions from the first 500 bitcoin users:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
#import dataset #edges <- read.csv("user_edges.txt", header=FALSE) #head(edges) ###subset first n users lim <- 500 edges.sub <- edges[edges$V2 <= lim & edges $V3 <= lim & (edges$V2 != edges$V3), c("V2","V3")] head(edges.sub,500) sub.unique <- edges.sub[!duplicated(edges.sub),] sub.unique$edgenum <- 1:nrow(sub.unique) head(sub.unique) sub.unique$edges <- paste('<edge id="', as.character(sub.unique$edgenum),'" source="', sub.unique$V2, '" target="',sub.unique$V3, '"/>',sep="") ###build nodes nodes <- data.frame(id=sort(unique(c(sub.unique$V2,sub.unique$V3)))) nodes$nodestr <- paste('<node id="', as.character(nodes$id), '" label="',nodes$id, '"/>',sep="") head(nodes) ### build metadata gexfstr <- '<?xml version="1.0" encoding="UTF-8"?> <gexf xmlns:viz="http:///www.gexf.net/1.1draft/viz" version="1.1" xmlns="http://www.gexf.net/1.1draft"> <meta lastmodifieddate="2010-03-03+23:44"> <creator>Gephi 0.7</creator> </meta> <graph defaultedgetype="undirected" idtype="string" type="static">' ### append nodes gexfstr <- paste(gexfstr,'\n','<nodes count="',as.character(nrow(nodes)),'">\n',sep="") fileConn<-file("output.gexf") for(i in 1:nrow(nodes)){ gexfstr <- paste(gexfstr,nodes$nodestr[i],"\n",sep="")} gexfstr <- paste(gexfstr,'</nodes>\n','<edges count="',as.character(nrow(sub.unique)),'">\n',sep="") ### append edges and print to file for(i in 1:nrow(sub.unique)){ gexfstr <- paste(gexfstr,sub.unique$edges[i],"\n",sep="")} gexfstr <- paste(gexfstr,'</edges>\n</graph>\n</gexf>',sep="") writeLines(gexfstr, fileConn) close(fileConn) |
I subsequently imported the output file into gephi to create a network visualization of the transactions. You can view the process in the video below.
https://www.youtube.com/watch?v=wjw0ksaRSO4&feature=youtu.be
The resulting graph:
Here you can see that the modularity algorithms have identified clusters of tightly-knit users who transact frequently with each other, along with influential users who may be running businesses or may be serving as middlemen between other groups of users.