One Month at Google: I Know Kung Fu

Well, it’s been a little over a month since I joined Google.  I feel like Keanu Reeves in the Matrix: I know kung fu.

I know kung fu

Not really, but I have learned a ton about Google. I took a Tech Immersion training class, I travelled to our gleaming Sunnyvale campus, I commuted on an infamous GBus (but in the wrong direction, reverse-commuting from the valley to downtown San Francisco), I visited the San Francisco office (where there are really cool views of the bay), and I mostly learned how to get around the sprawling Chelsea office in NYC. The Chelsea building is has quite a lot of history. And yes, there is excellent food everywhere. I am trying to take the stairs and walk more to counter that!

First things first though: the tech, as you may have heard, is nuts. In 2.5 weeks I only scratched the surface of the enormity that is Google tech. Calling this training “immersion” may be a little… ambitious. It’s impossible to go deep with Google tech in that short of a time period. It’s just too big!

However, we did learn how to use the famous mono-repo for source code, the also famous massive code-build system (which is open-sourced as Bazel), the code search and code review tools, and all of the other basic tools you need for excellent software engineering. In typical Google fashion, there’s a good research paper that summarizes these things if you want to read up on it more. The source code repository has literally billions of lines of source code, and you can search all of it instantly. Google is pretty good at search, and this includes source code!

More on said mono-repo in this talk:

I also learned more about practices like Site Reliability Engineering (SRE), which Google famously wrote the book(s) on, Testing, Accessibility, Security & Privacy, and others. This part of the training program was quite thorough and enjoyable.

Even more important, I learned how to learn things at Google. Meaning, I learned the basic patterns of documentation, collaboration, team roles, structures, and all of the elements that go into working in an organization. Given that this is by far the largest organization I’ve worked for, this was something that I was concerned about. How do I figure out who to talk to? How do I find information about team X or product Y? How do I understand how things work, and how can I get things done? Which cafes are open now? I’m happy that I now have reasonably good answers to these questions.

Books I’m reading to supplement my learning:

Onward!

 

 

First Week at Google

 

I just completed my first week at Google. I joined Google Cloud‘s Office of the CTO (OCTO), our two-way innovation street connecting enterprise CTOs and CIOs with the magic of Google. I’m just beginning the journey, but so far, it’s been pretty amazing.

I’ve spent the last two years at Blackstone, working in the Blackstone Innovations team, heading up Data Science (including Data Engineering and Data Visualization). As I noted previously, it was a little rocky getting set up technically. Much of this was due to my not having worked with Windows significantly for about five years before that, so there was a lot of re-learning I needed to do. Eventually, we upgraded to Windows 10 with Docker, and that helped a ton. That said, I’m glad to have a Mac again!

But what is this OCTO thing? I did a quick Google search and found a little thread on Hacker News. The questions started with:

Ask HN: What does it mean to work in the ‘Office of the CTO’?

I have been recently running into people that work in the ‘Office of the CTO’ at their respective companies. They tend to be engineers, sometimes with graduate degrees, that may be doing in research, integration, or advanced development. Is there a broader definition or guideline for understanding someone’s role when I encounter this terminology?

The responses are interesting, but not really what OCTO is going for, I think. So ignore that.

I found a similar question posted on Quora:

Just what is the ‘office of CTO’ for? Are folks working in these function units respected by engineers and developers?

I like Stan Hanks‘ answer. He seems to describe something pretty close to what I think my role will look like:

Stan Hanks, I was a CTO before the title existed

Well… when I was CTO of Enron Broadband, the guys in the “Office of the CTO” were working on things about three years away from production. Answering big questions like “how can you prove that the digital artifacting in MP4 video over IP is not severe enough for 99.9% of viewers to observe?” and “as we get to 10Gb/s aggregation feeds, what does that mean for the maximum dwell time to touch each packet, given the progression of Moore’s Law?” and “would it be possible to stand up an edge distribution system in real time in the event that we noticed a huge amount of source pull from a single point source? and how would we notice that?”

All sorts of stuff.

I would have some 20% time contributions from key engineers in various departments who I wanted to keep engaged and out on the cutting edge, but for the most point, I had big-thinkers, not big-doers, who were willing to ask really hard questions and then GO THERE looking for answers – particularly answers that were at variance with what we had in the pipeline.

Except, I’ll be helping CTOs think about the future of cloud computing, not whatever Enron Broadband was doing… But I think the “all sorts of stuff” part will be spot-on.

Oh, and I need to become googley!

 

 

 

Talking in Front of People About Data

A few weeks ago I had the opportunity to speak on a panel at an event hosted by Bloomberg. It was fun! Here was the info about the event.

Bloomberg Speaker Series: Artificial Intelligence in Finance

Bloomberg invites you to join us for our 2nd Data Speaker Series Event. Our Speaker Series is a more intimate forum where we will discuss trending topics after working hours. Our goal for sponsoring these events is to gather a community of market leading practitioners to have a chance to network and share best practices.

Bloomberg kicked off the Data Speaker Series in February with a very engaging discussion regarding alternative data usage in the investment process. The goal of the May event is to have industry experts across investing and academia discuss AI & Machine Learning tools and applications in finance. Featured speakers include leaders from Citadel, Neuberger Berman, Blackstone and NYU.

Time and Location:

Wed, May 16, 2018, 6pm-7:30pm
Bloomberg Enterprise, 120 Park Avenue (41st and Park)

Agenda:

6:00 PM – Registration & Networking
6:30 PM – Panel Discussion
7:30 PM – Networking Reception

Speakers:

Moderators:

  • Ashish Singal, Senior Product Manager, Artificial Intelligence, Bloomberg
  • Jeremy Baksht, Global Head of Alternative Data, Bloomberg

Discussion Topics – AI in Finance:

  • What are the unique challenges applying AI to financial markets?
  • How is AI actually being used today both sell side and buy side in trading / investing?
  • In terms of culture / people, how do we bridge the gap? Do researchers understand markets and do traders understand technology?
  • What kind of models specifically hold the most promise and for which use cases?
  • How will AI help unlock value in unstructured data?

Here I am pontificating about how “data is an asset class”. Deep!

20180516_Bloomberg_Data Speaker_AI_jeff

C&O Potatoes

For about a year after college I worked as a dishwasher in the nicest restaurant in Charlottesville, VA, the C&O. I was a pretty bad dishwasher, honestly, because I was too detail-oriented and therefore slow. I had to make sure each dish was perfect! Generally a bad plan for a busy kitchen. I mostly worked the slow nights.

However, the one big upside was that I was tasked with making the potatoes. I have carried this skill with me, and every year I make at least one batch, usually for Thanksgiving. They are still on the dinner menu as “gruyere-thyme smashed potatoes”, accompanying the Steak Chinoise, which was the most popular dish when I worked there 20 years ago. I bet it still is.

This is how I make C&O potatoes. (This post has nothing to do with data.)

I usually use 10 pounds of red potatoes. You can use other types of potatoes but since they’re not peeled (who has time for that?), the red potatoes look better and I think the skins add some earthy flavor. I bet Yukons would work well too. This will feed 20 people, easily, so scale back if you’re doing a smaller thing.

I dump the potatoes in a large stock pot and fill it with cold water. Dumping this many potatoes into boiling water would be splashy-burny, so just start with them in the cold water. Put it on the burner turned up high. It takes at least 30 minutes for them to cook through. You know they’re done when you can stick a fork in them easily. You can’t really overcook them though, so don’t sweat it.

While the potatoes are boiling, peel and press a bunch of garlic. I usually do about 5 or 6 cloves, but you can do more or less depending on how garlicky you like it. Also medium chop 3 or 4 regular yellow onions. Throw 3 sticks of butter in a sauce pan, melt the butter, then add the garlic and onion, season with salt (pepper comes later). Cook on medium heat, stirring every few minutes, until the onion is glassy and the mixture looks like a heart attack waiting to happen. That’s what you want. It should take about 15 or 20 minutes. If the potatoes need more time, turn the butter/onions/garlic down to a simmer since you don’t want the butter to solidify.

c_and_o_potatoes_1

While the potatoes and butter/onions/garlic cook, grate the cheese. In addition to Gruyere I usually also use Emmental, a swiss cheese. You want to mostly fill a medium bowl with grated cheese.

When the potatoes are done, drain them and put them back in the pot. Then pour the butter/onions/garlic mixture into the pot, and add everything else that makes these things awesome: the grated cheese, at least a pint of heavy whipping cream, a few big spoonfuls of sour cream, and more butter if you like. Also season well with salt and white pepper (black pepper is visually less appealing, but I’m sure would work) and add a bunch of dried thyme. Churn it all up with a potato masher.

c_and_o_potatoes_2

You can serve them this way, or if you like a crust, you can throw them in a casserole pan and bake them in the the oven for 20 minutes. I usually can’t wait that long.

Enjoy!

One Month With Windows

I’ve just spent four weeks using Windows at work, for the first time in about 6 years. Recently I’ve been working almost exclusively on Macs, with Linux for any deployed server processing.

However, before that I spent about 10 years on Windows, mostly as a SQL database engineer and occasional .NET programmer. Before that, however, I did a bunch of work on Macs. But that was in the olden days, when the internet was a mere babe, and many people received “catalogs”, which were strange little magazines printed on actual paper and delivered to your home for free, with the idea that you’d want to call (using a “telephone”) the company that mailed you the catalog, to buy things from them. My job in those days mostly involved weird desktop publishing programs like Quark XPress, which apparently still exists!

The point of all of that was that I’ve used Windows, I’ve used Macs, and I’m old.

When I say I’m now using Windows at work I mean old school, Windows 7 Enterprise. This version of the worlds most popular operating system became generally available to the public on Oct 22, 2009. Ah, that first year of the Obama presidency… seems very long ago, indeed!

Surprise: overall, while I have been able to get semi-productive on Windows, I’m not a fan. I need a Mac. Perhaps I can whine loud enough and make it happen?

Before going further, I should probably list out the types of activities I’ve been doing on this system, so far:

  1. Creating a big presentation in PowerPoint 2010 which is part of Office 2010, made generally available on June 15, 2010.
  2. A bunch of emailing (of course) in Outlook 2010, also part of Office 2010
  3. Limited chatting using Slack and Office Communicator 2007R2
  4. Setting up AWS infrastructure using Terraform using Visual Studio Code as the code editor and git for version control, of course
  5. Creating a python data analysis environment using Anacondapandas and such
  6. Establishing a bunch of project tracking stuff in Jira
  7. Documenting things in Confluence

So, to be fair, out of those 7 things, only 2 of them were seriously painful in Windows. The most painful was setting up an AWS infrastructure, because in order to do that I needed to set up a decent command shell and fight the proxy. By decent command shell, I mean:

  • Multiple shell tabs, like macOS Terminal
  • Ability to copy and paste code and command results from the editor the normal way (Ctrl-C, Ctrl-V)
  • Automatically resize the shell width when resizing the window
  • Command history with Ctrl-R search
  • Basic unix-y commands (ls, cp, mv, curl, ssh, more, head, cat, etc.)

I settled on Cmder with git for Windows plus gow for the basic shell environment. It’s wayyy better than the built-in cmd.exe, which is an abomination, even for 2009. But this setup still feels fragile, slow, clunky, and slapped-together. Which it is. But it does basically work.

However, running Terraform in the native windows command line environment was a nightmare. It would hang periodically, requiring the entire shell to be killed in Task Manager. I don’t know if it was because of the proxy (probably so) but it was unusable.

So I decided to try an Ubuntu Virtual Machine running in Oracle VirtualBox, and managed by Vagrant. That took some time to set up, again mostly because of fighting the proxy. The key thing that made it work for me was setting the proxy environment variables inside the VM like so:

$ export HTTP_PROXY=http://10.0.2.2:3128
$ export HTTPS_PROXY=$HTTP_PROXY

What is that magic IP? It’s the IP address of the host machine (Windows in my case) from the perspective of the VM, when it’s running as a private network with NAT, which is the default networking.

What is the magic port? It’s the default port exposed by Cntlm, which is running as a Windows service. Cntlm authenticates to the corporate proxy with my active directory credentials, then hosts a local proxy that does not require authentication. Authenticated proxies are a necessary evil of developing behind a proxy.

Once I hooked that up, I could vagrant ssh into the VM and generally speaking using tools like awscli and terraform to talk to the internet. And I was in regular Ubuntu bash, with minimal typical lag. Occassionally Cntlm hangs and needs to be restarted, but other than that, so far so good. And of course with a mapped drive, which Vagrant sets up automatically, I can code on the Windows side and use command line tools on the Ubuntu side and it mostly works.

Mostly. I need a Mac.

 

Fighting the Proxy

Like many companies, mine uses an HTTP proxy to protect us from various security risks, such as malware, unintended data leaks, and the like. In general this is a great thing. I heard from one of our risk engineers that malware incidents went down substantially when they implemented this proxy.

However, as a developer trying to code behind this proxy, it’s a constant battle. I recently found develop-behind-proxy, which is a very good resource for this. In particular, knowing how to configure the tools you’re using is super important. There doesn’t seem to be very much agreement out there in how to configure tools to route http/https traffic through the proxy, especially when any kind of proxy authentication is involved.

The things I’ve tried have included:

  • setting HTTP_PROXY, HTTPS_PROXY, and similar environment variables
  • setting the lower case version of these vars (are some tools case-sensitive? I think so…)
  • putting my username and password in the proxy URL (e.g. http://user:pass@proxy.company.com:8080)
  • URL encoding my username and password
  • since this is a windows environment, adding the domain and a backslash before my username
  • setting these in various places, like the windows user environment variables, Cmder‘s user-profile.cmd file, the .bash_profile inside the little Ubuntu VM I set up since some apps are either not available or break in weird ways in the windows shell

Of course I haven’t taken detailed notes so I’m not sure which config works for each tool! That’s on me though.

Oh, I’ve also tried Cntlm, and it has a nice little way to encrypt your username and password so that you can feel a little better about putting your credentials in a config file. But so far it hasn’t seemed to really help that much. The key thing I can’t quite figure out with it is how to reference it running as a Windows service from inside the Ubuntu VM.

The frustrating thing about this is that I get the feeling that all of this energy I’m spending configuring the proxy is taking away brain power and time from my “real work”.

But maybe not? Perhaps part of being a coder in a corporate setting is fighting the proxy? Perhaps we should put this in our job postings?

First Week

I just started a new job, and finished my first week. I’m working with people I’ve worked with before. However it’s been about 6 years since we worked together, so it’s a interesting mix of familiar and new, both culturally and technically.

Culturally, processes are very open and there are lots of discussions, both hallway chats and meetings. Of course everyone has these, but the thing that sticks out in this culture is that generally the content is “real”. Meaning it’s focused on topics that are highly relevant to the work being done on the team. Things like who’s going to work on what? Why is this approach better? What are we actually going to do next? There’s very little tolerance of high-level hand-wavy atmospherics. Business-speak is rarely used and often scoffed at. I like it.

Being an engineer, I’m probably more interested in how the tech stack is both familiar and new. The familiar parts are the application architecture (mostly 3-tier) and the programming framework (.NET). The new parts are that there is more automation in general, and more rigorous security practices.

An even new part of the stack is so new it doesn’t exist yet, because I am here to build it! That is, we want to build a data science stack, including all the big data stuff that entails. In short, to “bring in data science”.

Should be fun!

Just Write

I’ve been thinking about writing a blog for awhile. Like maybe 20 years. Well maybe 15. When I first heard about blogs, I thought, “wow that’s a cool idea, but who has the time?”, which really meant “my thoughts are probably not interesting enough for that.” The latter is probably still true, on any given day, but I’m older now, so I don’t care as much!

Also I recently wrote a blog post for my former company about the API they are building, and it was pretty fun. And I wrote another one about the same company’s participation in the open-source Symphony Software Foundation. So there’s that.

What is this blog going to be about? Probably data, data science, data engineering, tech, maybe some financial services stuff because that’s the industry I’ve been in for awhile now. Maybe other things I find interesting, you know the drill.

Ok, writing one post: check.