Hi all! In preparation for some upcoming interviews, I've been working diligently through a variety of tutorial courses at udemy.com. It's been an exciting and enriching time, and I'll share some specifics (with screenshots) below. A subset of the coursework/progress can be found in a variety of new repositories at my GitHub location (github.com/bradleypmartin).
Focus area 1: A basic image processing algorithm written in Python
A lot of the items to follow (big data tutorials; core coding in new languages) were sort of 'guided tours' through the various spaces. This mini-project, though - for whatever it's worth - was totally my own creation! I've been thinking a lot about image processing lately, and thought it'd be neat to come up with a (naive) algorithm that can reconstruct a tiled and scrambled image.
This algorithm cuts up an image into a user-defined set of rows and columns, gives some "overlap" pixels to each one of the resulting subimages, offsets each subimage by a (randomly-chosen) smaller amount of pixels, and then scrambles the subimages' order and removes any associative data between them.
Next, the 'fun part' commences by initializing and then filling a graph of subimage relationships based on pixel similarity, followed by (attempted!) reconstruction of the original order using the associative graph. In the GitHub repo for this project (ImageReconstruction) there are three separate .jpg files I've tried so far with the algorithm, and it works pretty well on these (usually 100% reconstruction accuracy) for small numbers of 'cuts' (maybe around 15-25 tiles, total). Finer divisions of the photos tend to result in a lot more computational overhead and reduced accuracy of the reproduction (fairly quickly so, too, as you increase the number of subimages past about 20-25).
Future work could involve any or all of the following:
Figure 1. My cat, Chocko, offered her modeling talent for the image processing project. I liked the Jupyter notebook environment for preliminary work on the project due to its easy visualization of various steps/modules-in-progress.
Focus area 2: 'Big data' tutorials and exercises
'Big data' tools and technologies are in huge demand at many of the places where I'm applying to work, and I'm pushing forward with a lot of tutorial material covering Spark (and other parts of the Hadoop ecosystem) access and manipulation of large, distributed datasets, MySQL creation and query of databases, Cloud9 development and AWS cluster deployment of big data algorithms, etc.
It's a lot to take in over a short period of time, but I'm very glad to be doing so; there's so much potential for engaging in cool new projects with all these new tools! A couple screenshots of my early progress are below.
Figure 2a. Here's a screenshot of my starting MySQL development environment in AWS Cloud9. I'm looking forward to creating (well, following along in creating, at least!) a web application associated with this course today (4/30/18).
Figure 2b. This was a fun process: I had spun up 5 m4.xlarge nodes on AWS to run some analytics on a 1 million-member movie ratings dataset as part of one of Frank Kane's online tutorials. I was using a Python/Spark interface here.
Focus area 3: Other new programming languages and development environments
Along with the space-specific 'big data' tutorials and exercises, I've also been pushing forward in broadening my proficiency in new languages and environments. Matlab and C++ have served me well in the past, but I wanted to start branching out as well. Scala, Python, and MySQL have been a big part of the efforts described above, and I've also been working on some core competency in Java. All the exercises thus far have brought me through a whole new wealth of text editors and IDEs (each with their own quirks and benefits - for example, I love how Git version control is baked right in to many of them!).
You can see some examples of Python scripts and notebook work both in the ImageReconstruction repository mentioned above, as well as in the SparkCourse repo. I've got some Scala code in the ScalaAndSparkCourse repo, and my core Java work is in the JavaCoreExercises repo.
Some of these repositories are more well-developed than others, but I hope to keep building on each as time permits.
Figure 3. I bet you haven't seen a 'Hello World' implementation like this before! But honestly, there's some more interesting stuff (classes/inheritance, exception handling, etc.) in my associated JavaCoreExercises repository.