In this talk filmed at DevRelCon London 2017, septalingual Developer Experience Engineer Elmer Thomas explains how Sendgrid re-built their 7 SDKs (Python, PHP, C#, Ruby, Node.js, Java and Go) to support 233 API endpoints.
All right. So, just a little bit about me. I am classically trained in the dark arts of computer science. I did that for a bachelor’s degree at the great fine University of California Riverside, which is also a citrus-based university. I don’t know why these oranges keep following me. Then, I went on to further go down the rabbit hole of the dark arts, and I entered into electrical engineering for master’s degree, specifically working on GPS navigation, and vehicle control using a no show navigation systems as well. Then, I became a failed or A.K.A. challenged founder of several companies that all crashed and burned. And, the only thing that I did that was somewhat successfully was I consulted for a while using my software skills to help people manage their social media. And then, one of my old college friends started this thing called Sendgrid, and enjoying that about six months in, just the fifth employee, and have worn many hats since then.
But, really, you know, why I’m here is because I become this. The septalingual member of the Maintainerati. Fancy talk for the crazy man who tries to maintain seven different languages across 20 some repose. So, how my journey started here? I’m going to blame Tim Falls. He’s the one who started our developer relations program at Sendgrid. And, when he gave the presentation of what developer relations was, so, “Yes. I want to do that. That seems awesome. I will travel the world and write code. That seems amazing.” So, did that. Got burned out after about two years. Spent a lot of time traveling. And so, as looking at what can I do next? So, I was in a holding pattern. I was a hacker in-residence for a while. Where I basically did the same thing that dev rel folks do, except without the travel. And then, finally, Matt came on board and switched from docs into this role of developer experience, and started this idea of we can have someone that can focus on the SDKs full-time. So, prior to that, as a dev rel team, we all share that responsibility in our “free time”. We’d all kinda just jump in whenever we could. But now, for the first time, we’d have someone, me, to focus on just the SDKs.
So, these are the core client SDKs that Sendgrid focuses on. These are the famous seven languages. You may notice that one of your favourite language is Pearl is not on there, and that’s just mainly because I don’t have the required beard, so they said I cannot touch Pearl. So, I decided, okay, fine. I’ll take the seven. But, really what we did is we looked at the volume of male traffic going through and found that these were the top seven ones that we’re using, so we have to make a cut off at some point. And so, these were the top seven that are used. Now, there’s billions of emails that flow through these SDKs per month. So, no pressure. So, you may have seen this picture earlier when Cristiano tore apart our stuff. And, that’s how I felt when I looked at this challenge. I literally scoffed at the challenge, and so we’re gonna do it. Actually, I went into the fetal position for several weeks, and Matt had to come get me said, “Ben, we got to do this.” “Okay, okay, okay. We’ll do this.” So, our first job is we needed to take… We had these 233 samad in points, and we had a V2 SDK code base that only supported one endpoint. So, we needed to figure out, how can we support all of the endpoints in this new version of our API, across seven languages? And so, we got onto a Spreadsheet, because that’s what we do, we estimate things. And, we estimated take over eight years to do this by hand. And so, obviously, that’s not acceptable. And, you know, our managers did not have that face. Like, no. We got to take eight years to do that. And so, obviously, the question became to automate or not to automate? Obviously, that question is pretty obvious. Of course, we have to automate, but the deeper question is, what to automate? And, that’s why I’m gonna through in this talk and hopefully, save you some time, and tell you some things that we’ve learned in terms of what should be automated, what shouldn’t be, what should be lovingly crafted by hand.
And, we looked at some things, you know, the classical questions about build versus buy, and in our case buy meant open source tools, you know? Open API, swagger at the time, had a ton of auto Generation tools. There were some companies that were just getting started. APIMATIC was one of them, that was generating client libraries. So, like why we don’t just do that and be done? So, we did a lot of research and what we found out is that we needed to build our own solution. And, there was a certain level of control that we wanted, and we wanted to be able to move a little bit faster than what we thought we could do with the automated solutions. And so, ultimately, we decided to build on top of open API. That became our holy grail, that became the specification to end all specifications, and that became possible through our friends at stoplight.io. They made it very easy for us to define things in a graphical way, and then spit out the open API spec for us, so we didn’t have to tediously do all of these by hand. So, they saved us a ton of time, especially through auto-discovery thing where you set up your website as a proxy, and you start clicking around, and they start capturing all the API calls for you. Then, you just need to go in and fill in the details. That saved us a tremendous amount of time. Really, really big savings there. Of course, you know, we stole from the best. We looked at companies like Stripe and Twillio. We looked at where everyone was saying these people nailed it, and we looked at them, we did tons of research. But, at the end of the day, we had to roll up our sleeves and get going. And so, what we discovered is, you know, first thing we looked at was, okay, what are the things that we can automate here?
So, general documentation became an easy one because with the open API spec, you’re able to put in a lot of information. We can store the documentation in there, and then it made so that we can simultaneously generate documentation, and a bunch of other code. And I’ll get to some of other stuff we document a little bit later. Well, open API became our single source of truth. For integration test, another shout out to our friends at stoplight.io. If you hand them over an open API spec, and give them. They have a service called Prism. What it does is it spies on Americans. Oh, no. That’s wrong prism. No, no this Prism, actually what it does is something really cool. It takes your open API spec, and makes a mock server. This is huge, because before, what I was doing, and I had gone through two languages to do this is I was figuring out the mocking mechanism for each language, and then having to define that, was really tedious and painful. Now, I didn’t have to do that. I had one central source of truth. It spun up a fake Sendgrid, and then now, we have that running in docker. For example, if you contribute to our Python library, you download the docker image, we spin up the mini fake Sendgrid in there. And then, you can just get started without an API key. You just get going, making API calls, and we provide all the samples, and everything. Also, when we push code up to GitHub, we run Prism in there, and so we do the integration test in there, without having to do any kind of hand mocking across all the languages. We just have one source of truth for that.
So, example. So, that same open API spec allows us to create cut and paste examples like some of the ones that you saw earlier, where you just go in and say, “Okay. I wanna call this endpoint to get all my statistics for, you know, between this month and that month.” So, we generate all of that code for you. The low level code. So, I’m talking about the code that you use just to make the basic API call. So, I’m gonna get into the helper code a little bit later. The code on top of that obfuscates some of the details, but just the low level code, you can totally automate that with the open API. One of my favorites is the CLA, the contributor license agreement. So, you know, once you start growing up a little bit, the legal department will come to you and say, “Hey. People need to start signing some legal stuff, so that we don’t lose all these hard work that we’ve done.” And, we had some bad experiences. Personally, I had to do a pull request for some Azure docs. I got super excited because it was on GitHub. I was like, “Cool. Make a PR on Azure. It’s gonna be amazing.” And then, once I tried to do that PR, it said that I needed to get my CLA co-signed by a legal counsel, and I have to fax it in. I was like, “This must be punishment for Microsoft engineers that break bills after man, the fax machine.” I don’t know. But then, our good friend Ed Janeski, who used to be a dev rel, a partner of ours who handed the SDKs to me before he left. He was going to do a pull request, and he saw that we needed the CLA, and he said, “You guys aren’t Facebook.” And so, I said, “Okay. I’m gonna see what Facebook does.” And, I looked at Facebook, and to contribute there, all you need to do is fill out an online web form, authenticate with GitHub, and you’re done. So, I want that. And so, began to investigate and we found a service called cla-assistant.io. It’s open source, written by SAP, and it allows you to do that magic, and it’s fully integrated with GitHub.
So now, if you come to our repo, and you make a pull request, auto generated comment will come, and it would say, “Hey, you need to sign the CLA.” Provided you hadn’t signed it before. And then, once you click the button to sign it, you get a simple customised form, authenticate with GitHub, and then boom. Now, you can contribute. We have the record of the transaction. The comment gets updated saying that the person had signed the CLA, and ready to rock and roll. Those saved us so much time because our previous method, they have to actually download a PDF, sign it, and then email it back to us. Marginally better than faxing stuff.
Swag. So, we love to swag people because we have this nice fancy, soft, unicorn skin shirts. They are fantastic, and people love them, but the problem is to give them to people via our open source repos was very tedious. We’d have to have a spreadsheet, they have to email us their personal information, then I have to enter in the spreadsheet, then I have to send out an order. Highly irritating. Made me not wanna swag nobody out. It’s like, no swag for you. But then, we discovered our partner, who also does these swag for Hacktober Fest, their name is Kotis, K-O-T-I-S. They provided an API for their swag. Ooh, swag is a service. Love that. So now, what we do, if someone merges a pull request, and then we look to see that we haven’t send them swag previously, they authenticate with GitHub. We check and match the date belong to that pull request, and then they get a form that they fill out with their information, and then we get an email, saying “Hey. Somebody’s requesting to be swaged.” And then, we verify it, click, yeah. They’re cool. They can have swag, and then the API call us in, and then magically, one of our shirts, a sticker, and a hacker pin shows up at their door. It’s really amazing. So now, you want swag? This swag, it’s all for you.
Let’s see. So then, the last thing is GitHub interaction. So, GitHub has a wonderful web hook mechanism, and so we’ve automated a whole bunch of things around web hooks. So, when someone opens a PR, opens an issue, when they make a comment, when we label things. Like a quick example is, if someone puts an issue that’s clearly a support issue, I can just put a tag that says support, and then we’ll automatically send a comment directing them to our support channel. So before, I use a text expander snippet, and do this by hand, took a couple of seconds, but it became annoying. It’s a lot faster just to apply a tag, and then later, if I need to change that template, anyone in the company that has ability to change those tags can do it. So, those things, yes, automate.
So, what should you not automate? Well, the ACTP client. So, one of the things we were looking at is when we did kind of an inventory of what we currently had, we found that we had a lot of dependencies. And, right around that time, the whole left pad thing happen. And so, we were thinking, how can we like, wow we have the hood up, how can we go in there and fix some things? And, one of the things we decided is that we were going to try, whenever possible, to use the native HDB clients. That way, we wouldn’t have any dependency issues. This was particularly troublesome for languages like PHP, where we’re using things like Guzzle, and then we get people sending us things. “Well, your Guzzle doesn’t work on Symphony version 5.7, and all the stuff. It’s like, you know what, it’s too much. So, we decided we’re just gonna use the native clients because our APIs is pretty simple. We don’t need a lot of the advance functionality. And, the advance functionality we need, well, we’ll build it as needed. And so, that’s what we did.
And so, I believe that you should lovingly craft your HTTP clients whenever possible. The only language, this was not possible in was Java, because Java doesn’t believe in patch, apparently. They decided, we’ll not fix, we’ll not implement patch so that was kind of strange. So, most people use the Apache client there, and so do we. The helpers. So, I mentioned the low level code. So, the next level is you wanna create wrappers around that code to make it super easy. Like the code that Cristiano showed in Ruby, that actually is not the final state. That’s more like the lower level code. We’re actually working on, revamping all of our languages so that they have that helper level. One of the languages that we finish is C sharp. And, I have a funny C sharp story a little bit later. It involves stabbing, and I’ll tell you about that later. I’m still recovering from that.
README. So, the README is so critical, just like Cristiano showed you guys. When someone comes to your site for the first time, that README needs to explain really quickly how to get started, and get them to the first API calls quickly as possible. So, we did a survey, and looked at so many different READMEs, and we tried to collect the best practices from each one. We then, go back as customer zero, like Ade talked about earlier today, and think about, “I’m coming into this process for the very first time. Does this make sense?” And, we used onboarding for this to help as well. We have new Sendgriders come in, we find out what language is their specialty, And we ask them to create an API call in their language, and I love to sit actually physically with them, and watch what they do. I’ve learned so much just watching them click around, and going where they go because you can’t just do that over email or text. It just doesn’t work.
Getting started. So, after the README, we looked, took a step back, and said, “You know what? We need to define a set of use cases to get people started quickly.” Like there were certain things people did with our API, that were just kind of common things, and we needed to make sure all those common things were defined and super, super easy. And so, that’s something you should look at is, how do people get started with your API? And, how can I make them get to that first API call as quickly as possible? So, one of the things I like about the OneNote API with Microsoft, they talked about that was a measurement that they tracked. Time to first API call. I love that. That’s an awesome way to think about it. How quickly can you get them using your stuff?
Troubleshooting doc. This is Use Case Docs, it’s basically an expansion of the getting started docs. So, you’ll start to get more questions from the community, more things that they wanna do with their API. You should have a document that describes all the most common used cases, and how to do it in each one of the languages. And then, troubleshooting docs, you know, you gonna get several questions over and over again. And, you know, you’re gonna get tired of writing the same stuff over and over. So, you wanna have troubleshooting docs that you can reference people over to.
And, contributing docs. So obviously, when you’re running a bunch of SDKs, one of the ways that we maintain this, and maintain our sanity is by contributors. So, you should make it as super simple as possible. So, one of the ways is by docurising things, providing them with a very simple ways to use the API, like using the Prism executable so that they can have a working version of Sendgrid without even having an API key. It’s little things like that get people able to contribute, and automating all of the Travis stuff. So, people don’t have to worry about that.
And, the unit test. With the unit test, you know, you wanna lovingly craft those because you can’t automate all the edge cases. You wanna think through these things, and have a real engineer sit down, and try to figure those things out. And then finally, your Semver versioning. Don’t automate this because you’ll probably make a mistake. It’s easy to make mistakes even when you do this by hand, but, you know, people. And, one of the things we found is that a lot people just ignore this Semver. They don’t care that you did a breaking change, and it broke their code. They were like, “I don’t care that you did a major version, but why did you break my code?” So, you know, it’s a very important to be clear about how your versioning system works, and to gain the trust of the developers, to let them know if you have a minor point change, you’re not breaking their code.
So, community and collaboration. The good stuff. All right. So, let’s see. One of the things that we did when we re-wrote all of the libraries, everybody told us you need to do this collaboratively, right? So, we put in big bold letters on each one of the repos in red. Matt, won’t let me do the animated gif. I wanted to do the blink tag, but he wouldn’t let me. But, we did that in all repos, and we said, “Look, we’re about to rebuild all these stuff. We’re shaking it up. We want your feedback. This is your opportunity to help direct the future of our API.” And then, it was crickets. Nobody said nothing. There was no comment at all. I don’t even think there was even one comment. It was just bad. But, you know, obviously we can just sit there, and wait for somebody to say something. So, the best way to get feedback is you start deploying stuff. Start breaking to people’s things, and boy. Rebel, rebel, rouse, rouse. People, they woke up. I mean, we had some epic threads that were like 40 comments deep. I’m like, “Whoa, where did these people come from? We’re looking for them earlier.” So, if you wanna get some comments, do a major breaking change, and you will definitely get it. But now, what we found is once we’ve gone through this process, very important to document who are those vocal people, and who are the contributors. And, what we do now is when we wanna make a change, we specifically call those people out. And, it’s amazing how fast they respond. Because, when you call someone else specifically, it’s like you’re talking directly to them. You’re giving them the respect that, “Hey, your opinion is important to me. I want to know.” Like versus before, we’re just kind of a blanket. Everybody, give us feedback. But, now we’re saying, you, specifically, we want your feedback, and that’s worked very well for us.
So, just a couple of quick examples. I call it dynamic gate or the stabbing because I think about what happened to Jon Snow, when the watch came. I felt like that’s what happened to me with several of these communities. But, what happened was… So, we were automating the processes. I grouped the different languages into two different buckets. Dynamic type languages and statically type languages. And somehow, I thought C sharp should go in the dynamic bucket. No, don’t do that. You’ll piss them off, and they would be brimstone and fire. That was one of the most interesting threads that I ever took part of. I mean, people, you know, they told that I didn’t deserve to be born, and all these sorts of things. And I was like, maybe you’re right. So, I started thinking and I questioned my life. I was like damn it, maybe they got something. But, the thing that I learned from that whole process is that if someone’s passionate enough to say such things, then listen to them. If they tell you, “You suck,” say, “Awesome. Why do we suck? Tell me specifically.” Like Cristiano, he told us some very specific things, and we’re gonna go back to our offices and get these things fixed. That’s super valuable, that’s huge, and that’s what you wanna do to all of the people. We had… So, one of the people in the .net community, he decided that our stuff sucks so bad that he was just gonna fork it, and create his own version. And, it was amazing because people started drawing toward his version, and so we basically just copied what he did, and brought him back into the fold, and gave him the credit. And now, he’s constantly going in and solving issues before I can even do it. So, you wanna look for those powerful advocates who are very, you know, they know their language inside and out. You don’t wanna veto them just because they were negative in the beginning. Because sometimes, they may just been having a bad day. We’ve all had bad days, and you try to get stuff done, and you go to this API, and doing things backwards. Of course, you’re gonna be angry.
And then finally, another example is that gentleman in New Zealand. His name is Adam. He re-wrote our entire node JS library. It was amazing. And, you know, the first inclination maybe, well, this is my baby. I am the one that writes the library. But, he wrote it, the community loved it so he actually folded it directly back in, and he’s the main contributor of that library. So, that sort of thing happens all the time. And final anecdote is person on Twitter. He basically told me I sucked and I should never program again, and all sorts of crazy stuff. And so, I reached out to him, and I said, “Hey. I wanna know exactly why do I suck? Tell me.” And then, we went back on Twitter a little bit, and finally, I said, “You know what, let’s talk about it. Can you call me?” And, he told me later, we had a two-hour conversation, and he told me the only reason he talked to me is because he went to my GitHub profile, and saw that I was managing a bunch of languages so he felt a bit sorry for me. And so, we talked for like two hours, and he helped us tremendously understand where we had a whole bunch of shortcomings, and there’s no way I would’ve been able to do that without talking to him on the phone even though he was, you know, in Australia. It was a beautiful thing. And if you could… The more you could do that, your product will be very, be much better. And, Matt will talk a little bit more about this later because what he does now. He talks with developers day in, day out, and we’ve learned so much from that effort.
So, how do we do all of these things? How we prioritise all of these things? Matt has an awesome blog post called, “Double Your Velocity without Growing Your Team with RICE.” RICE is an acronym that means, reach, impact, confidence, and effort. You should read his blog post. Where he references the Intercom blog post that explains how it works. I definitely recommend. I’m running short on time so you should definitely check that out for yourself, but that’s how we manage everything. Short story is that we were using Jira before to manage our stuff. And, what we found is that with backlogs of 400, 500 items, you can’t manage that effectively in Jira. You can’t know what’s next. So, we have to dump everything into a Spreadsheet and apply this RICE formula so that we can see what’s the next thing that we should be working on.
Now, this other stuff. I think I’m gonna save this for you because I just have one minute. But, what I was wanting to give you guys is a day in a life of what I do on a daily basis. To give you an idea of how it’s possible to manage all these things, and all the things I do in a typical day. So, you’ll have the slides later, and you can ask me questions about any of these specific things. But, basically, I do Pomodoro-driven work. I have to be focused or else I cannot get these things done. And every day, I have a certain list of things that I need to do that go beyond that backlog that I was talking about that gets prioritised. Because, if I don’t do these things, then things will fall between the cracks, and things like, you know, turning in certain documents so I get paid will not happen, and that’s no good, right? And then also, a template that I use for meetings because since we don’t have much time, meetings has to be super efficient. So, we use this meeting template to get things done. And then finally, at the end of the day, I have this checklist to make sure that I prepare for the next day, so that as soon as I get back into the game the next day, ready to rock and roll. You got to be super efficient when you’re managing so many things. So, even if you don’t use my checklist, have a checklist. It’ll save your sanity on those days when you have a whole bunch of people pinging you, and needing your help. That’s it. Thank you very much. Enjoy lunch. And, yeah. I’m thinking serious on Twitter. If you have any further questions, happy to help.
What has visual design got to do with API design?
Developer relations often takes a toll on the people doing the job. So, how do you look after yourself while also looking after a developer community?