I haven’t written in here for a long time, so I’d like to list what I’ve been doing these past few months to get the ball rolling:
Databases:
I took an interest in the study of databases late last year, and I’ve gotten a little further by studying the underlying concepts of database design and data modeling. I took the initiative to learn the subject after struggling with the technical aspects of my job. Last year, when I was first working with generalized linear models, I quickly found out that I couldn’t design the queries necessary to get the data I needed, and that I couldn’t communicate effectively with the IT personnel who were responsible for implementing my pricing models (in this case, implementation means to program the database systems that the business uses to store and capture information). Whenever I would ask them something, they would respond via their language of choice, T-SQL, which I had seen glimpses of during my internship days but couldn’t understand. At that time, I only had a vague notion of what a relational database was, and that it was somehow fundamentally different from an Excel spreadsheet, although I didn’t know why; if you ask me today I’d be able to tell you some of the fundamental differences – such as how a database has strictly defined fields and entity relationships, whereas Excel does not, etc. (well you could design an Excel spreadsheet to have those features, but it’s impractical).
This was perhaps the first time I thought to myself that maybe learning about databases would be important. I took a look at some books on SQL and another one that a co-worker checked out of the library, and determined that it would take me many months, if not a few years of investment to acquire proficiency in the subject. I encounter unfamiliar software and bits of code on a daily basis, so I often have to make the decision on whether or not I should spend a significant amount of time studying a software suite or language, or if I should just learn enough of it to complete my next assignment. For example, sometimes (about once every six months or so) I have to run some obsolete legacy software that I know will soon be replaced in the near future, so I’ll decide to learn just enough to get what I need. On the other hand, I see articles almost every single week pertaining to either SQL or relational databases on Hacker News, and several of my friends at coder meetup groups (which I’ll explain in the next section) use databases or write code pertaining to them, so I saw this as another sign that studying the subject would be worthwhile.
If you are not familiar at all with what I’m talking about, you may have seen stories in the news about cybercriminals (ranging from teenage misfits to professional cybersoldiers) infiltrating corporate and government databases, and you may have heard or seen the phrase “SQL Injection” thrown around in the technology section of newspapers. These concepts pertain to databases, which serve as centralized repositories for storing important information for businesses and governments. Prior to working in the corporate sector, news stories such as these were the only exposure I had to databases, and perhaps Americans who are working in non-technical jobs have a similar level of exposure, if any, at all. I think the media sensationalizes cybercrime because it deals with technology and concepts that the average citizen finds esoteric. I think perhaps stories like these, along with recent revelations during the past week should alert you to the importance of not only databases but also computer and technological literacy itself. I have some opinions on the matter but I believe that they are not yet mature enough for full disclosure now (maybe later), other than my belief that our nation is in a very precarious situation pertaining to civil liberties and privacy – and that our technological literacy is crucial to protecting ourselves from oppression. Perhaps this was the reason that influenced me to study databases seriously, or maybe it was really because I thought it would be important for work, I’m not so sure.
Anyway, I started getting some assignments at work done by first learning basic SELECT queries and joins, which allowed me to manipulate the data into the format I needed for my models. Lately, I’ve been looking in to theories on database design, which at least in my impression seems like some kind of applied set theory. I’ve also gotten involved in some open-source project dealing with databases, which I won’t reveal at this moment (maybe later). All I have to say is that I’m pretty excited to be working on something worthwhile.
Computer User Groups
I joined about 10 computer user groups on meetup.com. These user groups are weekly or monthly gatherings of people who want to talk about interests they have in common – such as music, sports, and various other hobbies. In my case, I’ve joined groups that discuss Python, Big Data, Perl, Linux, and Machine Learning that meet on a monthly basis. I’ve only been to the Python one so far, but I’m planning on going to a machine learning gathering in a few weeks. I’ve met some really cool people and learned a lot via these meetups.
Linear Algebra
I think a couple years ago I wrote about reviewing the mathematics that I learned in high school, and you may have noticed that on my “Readings” page, I’ve been stuck on my College Algebra book at 18.9% over the past 2 years. In that time I hadn’t stopped learning mathematics – I passed 3 actuarial exams in that time. However, now that I’m done with the preliminary exams I’m reluctant to go back to it, as when I was reviewing basic algebra, I was so bored going over things I had learned in the past and wasn’t patient enough to sit through it to make steady progress. Furthermore, I felt like I was spending so much time reviewing that I couldn’t devote enough effort to learning the new mathematics that I’m interested in. Therefore, I’ve decided to limit my “reviewing” to about 30 minutes a night, but make it mandatory. The rest of my time will be devoted to databases and linear algebra.
I took an intense course in Linear Algebra over a five-week span and did well, but due to the short time over which I learned the material (and the fact that it’s been 5 years since I took the course), I’ve forgotten most of it. However, I’ve seen matrices appear more and more often in papers and in the work I’ve been doing (a computer uses matrix calculations to perform linear regression), so I decided to brush up on my linear algebra. This is technically a review, but since I spent a such a short time on the course the first time, I consider the material “new” enough to count as furthering my studies of mathematics. I also found the course very enjoyable, so I think this will give me a fresh perspective as I fill the gaps in my early education (after college algebra, I plan to review geometry, trig, and calculus) alongside the study of this subject.
Advances in computing power have only recently allowed us to realize the power of statistical matrix computations on large data sets. Matrices have been around for centuries, and Statistics has been around for centuries, but large organizations were not, at least until a few years ago, able to practically perform statistical calculations on data sets, nor did they have the technological capability to digitally store the data in their systems to make such calculations possible. Now that we can, a new field called data science is emerging, and it’ll play a crucial role in society in the near future (perhaps “data science” and “big data” are just buzzwords – it’s really just applied statistics).
I’d like to close with a simple demonstration on how a system of linear equations can be represented as a matrix:
\[ \begin{array}{rrrrrrr} x_1&-&2x_2&+&x_3&=&9 \\ &&2x_2&-&8x_3&=&8 \\ -4x_1&+&5x_2&+&9x_3&=&-9 \end{array}\]
can be represented as the augmented matrix:
\[ \left[\begin{array}{rrrr}1&-2&1&0\\ 0&2&-8&8 \\-4&5&9&-9\end{array}\right] \]
The properties of matrices allow us to sovlve systems of equations like these very efficiently. This particular example isn’t anything special, but I just wanted to show off the new LaTeX package I installed after switching to self-hosting. I think it’s much better than the default plugin used by wordpress.com – and I can even choose which one to use. This package is powered by MathJax which I think displays the formulas much more elegantly and cleanly than before. In addition, you can highlight each component of each formula, which is an improvement over what I had been using before. I’m satisfied with this plugin, although typing the above example was kind of a pain because the WordPress syntax for LaTeX expressions is a little different than what you would use for TeXLive. Actually, I think the terms in the equations above are too spaced out for my taste. I tried using the {align} environment but it came out weird, so I settled for the {array} environment. Maybe I’ll change it later.
I also have a new theme installed, which I like better than the temporary substitute I used last week. I think I’ll keep this one for now.