The Architecture Challenges of GDPR in 2018

GDPR is a regulation that requires businesses to protect the personal data and privacy of EU citizens for transactions that occur within EU member states. And non-compliance could cost companies dearly. This week on the NFJS podcast, I sit with Jeremy Deane to discuss the architecture challenges of GDPR in 2018.

Full Transcript

Michael C.:                            You’re listening to No Fluff Just Stuff podcast. This is your host, Michael Carducci, and we are in ArchConf, in Florida, ArchConf 2017. And this week I am joined with Jeremy Deane. Jeremy why don’t you go ahead and introduce yourself?

Jeremy Deane:                   Sure. My name is Jeremy Deane. I work at as chief architect at Foundation Medicine out of Cambridge, Massachusetts. I’ve also been associated with No Fluff for many, many years both as an attendee and for the last seven years as a speaker.

Michael C.:                            This week we’re gonna talk a little bit about GDPR and some of the architecture challenges we’re gonna be facing in 2018. Can you give us a quick introduction? What is GDPR?

Jeremy Deane:                   GDPR stands for general data protection regulations, and they were established by the European union in order to protect the rights of their citizens in regards to privacy. And they go into effect on May 25, 2018, which is right around the corner.

Michael C.:                            It will be there before we know it. For sure.

Jeremy Deane:                   They do and many organizations that operate internationally, especially in the European Union are gonna have challenges complying with these regulations. One of the more harder ones to deal with is what’s called the right to be forgotten. And what that means is a European Citizen can make a request to an organization and say I wold like you to delete all of my data, to remove it from your systems. From an architecture perspective, this is very hard. We don’t really do that in systems. We do soft deletes. We deactivate their account.

Michael C.:                            Yeah, and by design, that’s not even … there’s not even anything malicious about that. There’s a lot of reasons that we do things in that way.

Jeremy Deane:                   Right, and the data gets even harder when we start copying that data and it makes its way down into business intelligent systems or other processing systems for billing and finance. You can see that your personal information in any system is replicated over and over and over again. Imagine the challenge of an architect or a technical lead to go through their enterprise and find out all the places where personally identifiable information is located and be able to extract that.

Michael C.:                            Now is it only the personally identifiable information or would it ultimately, if I said, I want to be forgotten, would all of that data, anywhere that I got pulled into some aggregation, would that need to be cleaned as well?

Jeremy Deane:                   That’s where it gets a little nebulous in terms of how much you have to delete. One thing that has been floated by is what if you were to de-identify the data and scrub it and keep the data, but make up some kind of anonymous name or some type of global unique identifier and assign that? But again, you’re gonna have trouble with database constraints around names should be in this format, and addresses should be in this format, and phone numbers in this format. So sometimes even coming up with mock data to replace real data is a challenge.

Michael C.:                            Sure, and I’ve definitely dealt with that over the years, but also talking about scrubbing the data, and de-personalizing the data, that’s a little harder than it sounds. I mean just looking at the … there was an AOL leak or what was it? There was a search engine, they had a huge data leak years ago. Do you remember this?

Jeremy Deane:                   Yeah.

Michael C.:                            And it turns out that even though your search history didn’t say Jeremy Deane, it was pretty easy to figure out just looking at it whose data was whose.

Jeremy Deane:                   Yeah, you’d look at things where I’m from Massachusetts and we have Mass 201 and it really talks about not necessarily these specific data points always and if you de-identify these you’re fine. It’s really any pieces of data that can be pieced together to identify an individual-

Michael C.:                            And that is a difficult thing to quantify.

Jeremy Deane:                   Its is. It is. Another challenge with GDPR is another regulation, is the right to transfer your data. This is anathema to US competitive organizations where imagine if you were to say to Verizon, “Hey I’d like you to transfer all my data to Comcast now.” And what they would say to that. But that’s exactly what the regulations are asking to do. They don’t exactly say how you’re supposed to do that but they do say that you have to be able to transfer the data to the organization where the EU citizen’s requesting.

Michael C.:                            So kind of going beyond that though, I know some of this is not new in parts of Europe. Like Germany has always had the most stringent regulations around the data that you collect. There’s a word and I don’t know … Neil, Neil Ford knows it. There’s a German word that was coined around this and again I don’t know the German word, but the literal translation is something like be parsimonious with the data you collect. We have a tendency to kind of gather as much data as possible and then we can use this later when we kind of decide how to use this. Their philosophy is kind of the opposite. Gather the minimum possible data. Are some of these things then kind of expanding out across EU? Are we gonna see some of the challenges that we’ve seen over the past few years in terms of doing business in countries like Germany. Like I know, Azure for example, has had to stand up their own independent subsidiary in Germany that is owned by German organization and German trust and everything is wholly located in Germany, and that data center complies with all the laws that the other data centers don’t. We’re gonna see more of that in terms of the clouds we use and things like that. Is that gonna be an issue going into this?

Jeremy Deane:                   I believe so. Even Brazil is looking at copying the GDPR regulations put out by the EU. We joke about the great wall being put up along our border if you will in the United States, but there are virtual walls being put up all around the world. Every country it seems is wanting to keep their data local from a competitive standpoint, but also from a privacy standpoint. I think one of the things that, as architects and technologists, we’re gonna have to do is to really think about how we associate metadata with data so that we can properly tag and treat that data, and also be able to report back on how that data is being used any request. You can imagine a regulator coming in, the regulator, and saying okay well show me did you tell the citizen how you’re going to use that data? Do you have a process for transferring the data? Do you have a process for forgetting the data? Those are tough questions and I think Europe is just the start.

We’ve seen actually different states in the united States add in their own slant on privacy regulations. I think we’re only gonna see an increase. So we need to think about that when we’re designing our data models and our systems to have those capabilities by default. The real challenge for organizations is all of the existing applications. How do you go back and retrofit all of that and all that integration?

Michael C.:                            Well one other thing that just kind of blows my mind a little bit is that I like being a software engineer for so long because a lot of these hand wavy policy things I never had to worry about. And I think this is just another indicator of the shift in our industry. I never had to be a security person. That was always somebody else’s job. But the reality is, as a software engineer, that is absolutely my responsibility, and other people’s responsibility. And now we have kind of this aspect as well, but we need to be aware of these changes, and I don’t know … I know you’re speaking here at ArchConf. Are you covering any of these things or is there something that really, you’re gonna be picking up into the new year?

Jeremy Deane:                   You know there might be something as we address it within my organization or my previous organization, it becomes part of something that I talk about specifically around governance and compliance. Certainly that might be preview for next year’s ArchConf right?

Michael C.:                            Yeah.

Jeremy Deane:                   But I do touch on it a little bit in my architecture resiliency talk and trade offs talk. But really it boils down to risk. If you were to try to comply with all the different regulations, whether it’s GDPR, Soc 2, HIPAA, and fully focus on that, you might release one feature a year. So what you need to think about is what are the key points or the most sensitive areas within your systems and with your enterprise that need the most attention, and then work backwards. But certainly, there are some practices that we can put in place from the get go around privacy and security that will lead the systems that will lend themselves to being compliant with whatever regulations come up in the future.

Michael C.:                            Yeah. So thinking about that, one of the challenges at first I believe is actually identifying in your systems where that data lives. I was recently at a talk or in a conference or in a keynote I believe it was, and somebody asked that question to the room. Do you know where your most sensitive, most critical data is, and who has access to it? And I think I could raise my hand in a room full of 150 people there might have been three hands.

Jeremy Deane:                   Yeah. That’s what I was getting at with the metadata. Sandy Pentland is an MIT professor and he writes a lot on privacy, and one of the things that he talked about or promoted a while back was the idea of a personal identity store where you put your personally identifiable information and then you point companies at that location. So you were in control of the information and you could always cut them off and tell ’em to stop using it.

Michael C.:                            You revoke a token or something like that.

Jeremy Deane:                   Exactly. I think within an organization, you need some type of central identity store. Something that is a pointer to all the places where you have it. Unfortunately, there’s really no easy way to do this. It’s a laborious activity of going back through these systems, understanding where they may have been developed 10 years ago, 5 years ago, where-

Michael C.:                            When none of these things were an issue.

Jeremy Deane:                   … Right. And actually, identifying them, tagging them, which you can do in most persistent systems and then having them point back to some type of central identity store or identity management store, it gives you the visibility that you need to comply with these regulations.

Michael C.:                            There’s a huge trend over the years with micro services and everything else, kind of moving more and more towards heterogeneous, heterogeneous, I’ve heard it both ways, data stores and that becomes a little bit of a challenge because we’re breaking everything up. It’s a little harder to immediately see all the places that these relationships touch.

Jeremy Deane:                   Yeah, I think one of the things that I’ve seen certainly in container worlds is the idea of sidecars, and so you might be able to just add a sidecar to a container and the sidecar is responsible for listening to events admitted by that micro service having to do with this privacy information’s been captured, or this privacy information’s been updated, or this privacy information has been read. And then you have an audit trail of where it’s being used. Obviously you have to design the service as to admit those type of events. But by thinking about micro services as sort of admitting the state transitions or even just the transactions that it has as events, you can then react to those and start to create these, the ways to manage this personally identifiable information.

Michael C.:                            And for the benefit of everybody listening, if they don’t know what a sidecar is, I don’t know if you can give us like a couple of sentences?

Jeremy Deane:                   Sure. I mean it’s a way of deploying a container along with another container, right?

Michael C.:                            Okay.

Jeremy Deane:                   Think of the old motorbike with the sidecar, where you’d have a passenger attached to your motorbike sitting in a separate cart. You know famous World War [inaudible 00:12:31] picture. Right?

Michael C.:                            Yeah. Yeah.

Jeremy Deane:                   And so, that’s really what a side car is so the develop working on the micro service is really focused on that single purpose feature that they want in the micro service, but the sidecar is being developed by a platform team and something you can just attach and get capabilities, a capability injection if you will.

Michael C.:                            Sure.

Jeremy Deane:                   And things like a logging service, a security service, so that you’re not having to build these into the micro service.

Michael C.:                            Is this a feasible strategy for going and retrofitting some of your legacy applications to bring some of the requirements to add in some capability or …

Jeremy Deane:                   Possibly. I think you’re still gonna have to do even what I would call noninvasive surgery, right? You’re gonna have to go into these applications and perhaps you can use some aspect during a programming where you’re not touching the original code but you’re admitting events from the code to track when personally identifiable information is used and how it’s used without modifying the old code. I think most systems, whether it’s Ruby, JavaScript, or Java have some of those capabilities that you can use. Some of them might just be you have to go in and start modifying a data model. That can be particularly challenging depending on how that data model is maintained and was originally built.

Michael C.:                            Sure. So just to kind of bottom line some of this just a little bit, as much as I hate that turn of phrase, so this change is happening, going effective in May?

Jeremy Deane:                   Correct.

Michael C.:                            2018. And who is this going to affect?

Jeremy Deane:                   It’s gonna affect any company that does business in the European Union. Bear in mind that it really is all about risk. And if you are starting to become compliant with this regulation, a auditor is gonna be pretty lenient towards that. If at least you have a plan and you’re taking action, you don’t necessarily have to be completely compliant with all of your systems that interact with EU citizens on May 25. That would be completely unreasonable.

Michael C.:                            Sure.

Jeremy Deane:                   But if you at least have an action plan, you can lower that risk. Certainly, if you’re a high profile company making lots of money, you’re probably gonna be a higher target than a company that’s either making less money or is smaller or less than well known.

Michael C.:                            And there have certainly been examples in the past of I want to say the European Union making an example out of high profile companies here in the US.

Jeremy Deane:                   Certainly a US company that has competition with European companies and that’s doing business in Europe is gonna be on of the first targets.

Michael C.:                            You’re gonna feel that too.

Jeremy Deane:                   Exactly.

Michael C.:                            And the other thing that I think is really important to point out like you said is that May, this legislation goes into effect in Europe. Brazil’s looking at the same thing and we’re gonna see similar legislation exploding all over the world in the coming months and years.

Jeremy Deane:                   That correct.

Michael C.:                            So this is something that even if you’re not currently doing business in the EU or looking to expand into that market, this is something that we’re all going to have to deal with at some point. Moving into building a new application, if you’re architecting new … what are the things that we need to keep in the forefront of our mind?

Jeremy Deane:                   I think absolutely catching the metadata about personally identifiable information or private healthcare information. Any type of sensitive information, tagging that in your model as well as thinking about taking eye towards the Germans. Right? Not capturing as much information. Capturing only the information that you need for your business purpose. And then obviously the principle of read least privilege …

Michael C.:                            Of course.

Jeremy Deane:                   … thinking about how not everybody needs to have access to this data and minimizing the amount of systems or people that use that data. And finally, knowing at a moment’s notice where that data is captured and used within your enterprise is gonna be essential.

Michael C.:                            And the other big piece and the really thorny piece is these legacy applications that have been around for years or maybe decades, you’re going to have to retrofit that compliance. What is your advice on that?

Jeremy Deane:                   Start coding now.

Michael C.:                            Yeah.

Jeremy Deane:                   Honestly, I can’t imagine what you would do if you had mainframe with a bunch of PL/1 programs.

Michael C.:                            Oh wow, that’s yes.

Jeremy Deane:                   How you would go back and retrofit these and add those extra data points. I know from experience with some of the companies I’ve worked at, modifying a mainframe might take months to do because of just the way the data flows through program, after program, after program, after program within these systems.

Michael C.:                            I remember having to do some maintenance 20 years ago on an AS/400 application and I don’t even know where most of the data was.

Jeremy Deane:                   Exactly.

Michael C.:                            But … Well I really appreciate that. Thank you.

Jeremy Deane:                   Sure.

Michael C.:                            These are things that we need to be aware of and this is for me one of the biggest value adds of ArchConf is just how comprehensive everything that gets covered is. It’s not just okay your micro services patterns and circuit breakers and feature toggles and somebody’s architecture patterns. It covers the breadth of skills that you need to be an architect. Soft skills. It covers leaderships, covers these legal things. All these things that we have to be aware of. So thank you for being here and speaking. Enjoy this beautiful Florida weather and thanks for joining us today.

Jeremy Deane:                   Great, Thank you very much.

Michael C.:                            Thank you.

Leave a Reply

Your email address will not be published. Required fields are marked *

*