#20 Raluca Crisan – Data Scientist & Co-founder ETIQ on AI ethics and mitigating algorithmic bias

Raluca is a co-founder at ETIQ, a start-up that tackles unintended bias in automated decision making and she has spent the past 10 years working in data science and analytics across a variety of sectors. She has a BA from Amherst College in Massachusetts, and an MA from the University of York. In 2020 Raluca won the prestigious Women in AI award. Raluca’s experience spans managing large teams to hands-on data product development. 

#20 Raluca Crisan – Data Scientist & Co-founder ETIQ on AI ethics and mitigating algorithmic bias Get Your AI On!

Ciprian Borodescu [00:00:01] Hey, everybody, and welcome to Get Your AI On! the podcast. I’m Ciprian Borodescu, and this podcast is brought to you by Algolia, the AI-powered search and discovery platform. I’m the host of the show, and every episode I invite founders, entrepreneurs, business leaders and even AI researchers to share with us their experience in dealing with business problems that can be solved through intelligent use of data. This is episode number 20. Let’s get your AI on! 

Ciprian Borodescu [00:00:38] I’m here with Raluca Crisan – data scientist and co-founder ETIQ and winner of the Women in AI Awards 2020. I’m super excited and it’s an honor to have you on this podcast, thank you so much for being here.

Raluca Crisan [00:00:53] Yeah, thank you so much for having me. I was looking forward to this and also I was quite excited when I first heard about it. I think it’s great to set up something like this between founders.

Ciprian Borodescu [00:01:09] Well, let’s start soft: How are you today, Raluca? How do you feel? 

Raluca Crisan [00:01:16] Yeah, I’m good. It’s super cold here, minus one or two degrees. So I know it doesn’t sound cold, but I feel pretty cold. 

Ciprian Borodescu [00:01:26] Now, it is almost spring here in Bucharest. But yeah, I’ve seen that the weekend is going to be kind of like what Western Europe experienced this past week. 

Raluca Crisan [00:01:40] Yeah, I think my mom was telling me it’s going to snow again, isn’t it? 

Ciprian Borodescu [00:01:47] We’re kind of used to here – when March comes and it should be spring in Romania usually it’s the middle of the winter. 

Ciprian Borodescu [00:01:56] Let’s now introduce ETIQ: ETIQ helps financial companies identify and eliminate bias within their AI algorithms and ETIQ is a London based startup and you recently raise close to a $800K in pre-seed funding that includes equity investment from SFC Capital and grants from Innovate UK. 

Ciprian Borodescu [00:02:21] And before diving into the details of how and why you built ETIQ, I’d like to invite you to tell us more about yourself, why data science and how you started your career in this space. 

Raluca Crisan [00:02:36] Yeah, sure. So I’ve been roughly in the data science space for about 10 years now. I think I kind of fell into the it, to be honest I have a slightly unusual past in that I did economics and English, I took a lot of math and statistics and I was always find that that’s my focus. 

Raluca Crisan [00:03:06] But then I did a master’s in medieval English literature, which was a lot of old languages. And then I was just looking for something more real world and applied. 

Raluca Crisan [00:03:21] But I think, you know, using a lot of languages and different kinds of languages, I think it’s a similar learning process. And also having that analysis component, which I think is quite key to that science and just kind of interpreting things. It’s odd, you know, but there more similar, some of these disciplines today. Yeah, that’s my odd fact of the day! 

Ciprian Borodescu [00:03:53] And when exactly did you write your first line of code? 

Raluca Crisan [00:04:01] What year was it? 2008! Yes, it was quite a while ago. I don’t think that data science was like a name that people had for it. 

Ciprian Borodescu [00:04:14] For sure it wasn’t that the sexy as it is today and that trendy. 

Raluca Crisan [00:04:23] It was not! The first package I was using was SAS, which is a long time ago! But then I was always interested in moving into kind of a newer language, because this was never an open-source or anything like that, so it was quite prohibitive for small companies or startups. It was very hard to do this, really. So, Python was kind of obvious choice at the time. 

Ciprian Borodescu [00:05:04] So would you say as a Pythonist that the world is divided into R people and Pythonists? 

Raluca Crisan [00:05:12] I feel like R is really losing steam. I think maybe five or six years ago people were really keen on R. At the time I was trying to push my Python agenda – maybe six or seven years ago, and people were like “What, no, we’re just going to use our R, we love it!”. But now things are much more even, I think. 

Ciprian Borodescu [00:05:37] What is ETQ, what does it do and why is it important for the world we live in today? 

Raluca Crisan [00:05:46] That’s a great question. The software itself is meant to help you identify and mitigate algorithmic bias. That means if you have an automated decision system that has a predictive model or maybe an ML/AI component at its core, sometimes (quite often actually) this component can be biased against a certain demographic group to no one’s fault. When that happens it’s important that it’s remedy. Otherwise, in the classic examples such as law, decisions, recruitment or facial recognition, it doesn’t really work for 10 percent of the population that happen to have a different skin color. That means that it doesn’t really work and I think that’s the challenge. Not only when it comes to facial recognition, but on the other type of problems that we’re trying to overcome. 

Ciprian Borodescu [00:07:01] OK, and what is the cost of not using ETIQ or more generally speaking, what’s the cost of not dealing with biases in AI algorithms? And I guess this is kind of like the question that you also got during your fundraising process, but also when dealing with customers, especially in the financial industry. 

Raluca Crisan [00:07:23] Yes, for sure. I’m going to start out at the company level cost, which are slightly social, not just commercial: there’s actual fines. I know no one knows about them, but yeah, occasionally there’s a compliance risk, which is quite large actually. I think especially financial institutions that have had issues in the past are really keen to avoid that. The risk for them is really big. Sometimes it goes beyond the fine and looking at all their operations and stuff like that, because originally it’s a discrimination law issue. So you shouldn’t be discriminated against women or black people. I mean, that’s it’s really bad. 

Raluca Crisan [00:08:18] That’s one level cost. Another level is the PR cost since the public really doesn’t like these kind of things and there’s much more focus on the equality agenda globally. That is reflected quite negatively in the way people look at companies that don’t take this seriously.

Raluca Crisan [00:08:52] From a human point of view, we’ve started this as coming from of social impact type accelerator and for us it’s a question of fairness. I mean, this shouldn’t really be happening. It’s like a huge opportunity cost, not to hire the right person or to miss a section of your population that you could be giving credit to. Why would that happen? It’s unfair and it’s kind of commercially detrimental as well. 

Ciprian Borodescu [00:09:26] If we go into the philosophical aspect of the issue, I’m wondering, is bias an AI issue or a human issue? Probably it’s both, but if that’s the case and if we can engineer our way out of AI bias, maybe not fully, of course, but partially, when it comes to human bias: what options do we actually have here? I know this is not a technical question it is more philosophical, like I said, but I would really love to hear your thoughts on that. 

Raluca Crisan [00:10:10] I’m going to answer by taking a slight detour. We hear a lot of this argument, especially in the EU … not outlawing AI or ML, but regulating it very harshly for certain purposes. Essentially saying you shouldn’t really be using this type of decisioning for some decisions. We thought quite a lot about that, the philosophy behind it, because I personally don’t want to spend my time on something that is fundamentally flawed. 

Raluca Crisan [00:10:47] I think that ML can bring a bit more equality in the end. I think you’re actually right. I think that the bias is in humans are huge. And one of the benefits of things like machine learning is that maybe it can remove some of those biases. Of course, if it’s biased itself or if it amplifies it that is worse, but I think in the end, it can be used to actually make things more equal. 

Ciprian Borodescu [00:11:14] Yeah, even making us aware of our own biases, right?

Raluca Crisan [00:11:17] Yeah, that’s exactly right. This is a very bad analogy, but if you just look at the pure numbers, less than 10 percent of startups that are funded are female only founded the past year. I mean, it’s just seems very unlikely that this is a coincidence. I think it’s the same with machine learning and bias. I think it can pick up on these issues a little bit as well, because for it it’s just patterns that it sees: it’s like, well, if you’ve seen this, it’s unlikely … that’s why it comes up with these bias things. But we know, OK, let’s try to teach it that it’s not right.

Ciprian Borodescu [00:12:13] You’re absolutely right and I have a question about that. But until then, because that’s an important topic and honestly, between me and you, because nobody else is listening, you don’t actually need AI to figure out that we do have a gender problem in the STEM, in the startup world, in the entrepreneurial world when it comes to female funded startups. But about that in a minute. Until then, I wanted to ask you what other industries are susceptible to AI or machine learning biases and assuming that all industries are in fact affected by it, how should companies initiate conversations around ethics and not just sporadic, here and there, but a continuous dialog that I imagine should happen at the leadership level, yielding results. Where should they start and what are some of the most important things to consider when it comes to ethical AI, at the company level? 

Raluca Crisan [00:13:17] That is a great question. I’m not sure I’m the right person to answer, but I’ll give you my perspective. First of all, in terms of sectors it’s definitely a horizontal problem. I think financial sector is quite prone to bias just because it’s a little bit more spread out, I think. Whereas in recruitment, a lot of decisions still happen by humans. In the financial sector, things are automated, which is why they’re underpinned by potentially ML or just normal algorithm. So the problem is more of today, I’d say. But we’re also seeing the other key sector is justice, as with the cases in the States and also recently a couple of cases in the Netherlands. 

Ciprian Borodescu [00:14:07] Maybe we can stop for a bit here and you can give an example. I have to say that I’m not familiar with the Netherlands case. 

Raluca Crisan [00:14:20] As far as I can remember, they had some sort of decisioning system for giving out benefits to people, to citizens, and someone took them to court. I’m assuming it was an NGO, I don’t remember the details, but the court said: do not use algorithms for this anymore. It was quite extreme, which is why I’m saying that in Europe we’re going maybe a little bit too far. 

Raluca Crisan [00:14:58] There was another case in Italy, involving Deliveroo – no one would have thought of it. But essentially when they’re matching jobs to workers algorithms they’re using could be considered discriminatory. Again, a court ruled that and the intricacy there was because, you know, let’s say if you’re more available and more often are things like this, then maybe you’re more likely to be of a certain demographic. So, for instance, if you’re a mom and you put your kids to bed at 8:30 and you’re never available at that hour, if you use those kind of features it can be infered that you’re discriminating against that demographic. I think was a little bit more clear cut that that, but that’s the main idea … that it was just a little bit of. 

Ciprian Borodescu [00:16:10] Yeah, it’s interesting because you would think that they would not, like you said, previously, outlaw using AI or machine learning, but in fact introducing a hybrid approach where you can use both the machine learning algorithm and the human agent. Because essentially this is what’s happening in medicine, for example, or in health: there’s always a doctor there reading the machine learning generated results and the combination of the two is always better than the the individual. 

Raluca Crisan [00:16:48] Yeah, I agree. And I don’t think the regulation said “we’re gonna outlaw all use-cases”. But even so I think that it was quite harsh for them saying: OK, stop using this for this particular purpose. 

Ciprian Borodescu [00:17:06] It creates a precedent, you know!? 

Raluca Crisan [00:17:09] I think there’s a lot of people in the EU that are quite clever and they’re thinking about it, so I don’t think they’re going to go that way, but I think they’re just going to make it a bit more stringent from a compliance point of view. 

Ciprian Borodescu [00:17:26] What are the conversations that you hear at the leadership level happening in various companies when it comes to ethical AI? Because there is no direct correlation with the bottom line.

Raluca Crisan [00:17:48] Yeah, that’s a great question. I think it’s quite unexpected where the incentive is coming from. The are financial institutions that are looking specifically at hiring algorithmic bias teams. But I think what’s more frequent is that we see a lot of credit risk teams repurposing their old checks and understanding how they need to evolve those and what they need to do. Because if they want to use more data or richer algorithms, then how does that translate into really checking for this, making sure it’s compliant and then, conversely, mitigating the issue.

Ciprian Borodescu [00:18:53] If I understood correctly, there are two components here: one is to identify and then the second is to solve it. Am I correct in assuming that? 

Raluca Crisan [00:19:03] Yeah, yeah. That’s roughly how we position it. 

Ciprian Borodescu [00:19:08] And let’s go together through a use case. That would be really interesting because, you know, whenever we talk about ethics and morality, it’s kind of like really general. But I would like to make it a little bit more specific if you don’t. 

Raluca Crisan [00:19:22] Yeah, sure. From a data scientist’s point of view, if let’s say they’re building a credit risk model, there are certain checks that they need to undergo, from the very unmastered data. But there’s the potential that that data might be biased and they know that there’s bias. That’s not the right way to phrase it. But there’s the potential that the data they have might yield to bias later on and actually the odds of that are very high because. 

Raluca Crisan [00:20:02] So basically, what we provide at that stage, it’s a pre-processing algorithm that helps them minimize that likelihood a little bit and also a bit of investigative work to show them exactly where the issue is. So obviously, if they have a really big issue then we’re pointing that out so they can go back and sort out their data. 

Raluca Crisan [00:20:30] So that’s one of the more streamlined cases. And it’s less kind of a management level and it’s more like as I’m doing my work and I’m building this thing, I’m getting this information that helps me to get better and less biased. So one of the cases we have is around risk assessment, customer level risk assessment. 

Ciprian Borodescu [00:21:02] After pre-processing is happening and after you do your investigative work, you provide those companies or data science teams with your audit, let’s put it like that. And then they need to fix that into their master data, into the original data. And I’m curious to understand, what are the potential solutions? How can you fix your data or your machine learning algorithm to eliminate as much as possible from the bias? 

Raluca Crisan [00:21:41] Yeah, there’s a lot of that around, but I can tell you what we’re focusing on. The pre-processing one is more like you’re redoing some distributions in the data to make sure that it’s more accurately reflective of an actual population. 

Ciprian Borodescu [00:22:02] OK, kind of like balancing?

Raluca Crisan [00:22:07] A little bit like balancing, but the problem is that if you do very straight-out balancing, that’s not really gonna make too much of a difference. Obviously, you should do that, and I think that’s kind of what people already do, it’s just not enough. 

Raluca Crisan [00:22:26] It has to be more sophisticated and you have to really dig and understand what the issue is. The other one is around the features themselves. The issue is that if you look at correlation or impact on the outcome, you’d have to not necessarily remove all of your data, but a lot of it, because a lot of it is correlated with demographics. 

Raluca Crisan [00:23:05] That’s just how life is. 

Raluca Crisan [00:23:12] And this is counterintuitive for data scientists. Generally, I think you want to use the most information; you don’t want to, remove information from the training because then you just get a worse model. 

Raluca Crisan [00:23:26] The trick here is how you: (1) either change or modify the feature for some people or (2) change something about yourself so that you can still use a lot of those features but ensure they don’t have a detrimental impact on the groups that you’re worried about. And that’s when it gets quite subtle. 

Raluca Crisan [00:23:51] All of a sudden, it becomes quite a subtle topic … you’re left with five variables. 

Ciprian Borodescu [00:24:03] While you were explaining, I was thinking, have you seen any situations whereby trying to fix a bias, you introduce another one? 

Ciprian Borodescu [00:24:13] The issue with the topic in general is how do you measure it? 

Raluca Crisan [00:24:24] If you were to identify exactly the bias then you should be able to pinpoint if they introduced somehow some of it. But unfortunately, different metrics will give you different outcomes. 

Raluca Crisan [00:24:34] And also local metrics will have a different outcome from global metrics. And then kind of what you consider and how you approach it is subtle; there’s a lot of decision-making on the part of data scientists or the team around them as well to deal with this topic. 

Ciprian Borodescu [00:25:08] You previously mentioned that you came across teams that were dealing with AI ethics. Can you give us an example of the roles that were involved in those ethical teams? 

Raluca Crisan [00:25:26] Scientists with legal backgrounds are trying to look into this … I think we do see a lot of people from diverse backgrounds. But at ETIQ we’re not too sure about or how it’s going to play out. 

Raluca Crisan [00:25:58] Outside of big companies that have AI ethics teams, we’re not sure how others think these teams are or should be. The reason I say this is because there’s the thesis that it’s not just the data that really makes the model biased, although it might be, it’s the people who build the model. Our role is to look at the overall landscape and understand that you can introduce bias at any point in the process. 

Raluca Crisan [00:26:45] And as a result, the people who are working on this process, they’re not really reflective of the population, what they’re modeling for. They could introduce this bias because the hypothesis you come up with could be quite specific to your way of thinking. 

Raluca Crisan [00:27:15]  I think we’re interested in seeing, instead of just having an opinion, how the market develops.

Raluca Crisan [00:27:21] I actually think we need more people from outside of that domain to start looking at this topic. And is this moving into democratizing the area? And I’ll say, yes, that’s true. At the same time, I think we need some people to look at this, something that would be most welcome and makes a lot of sense. 

Ciprian Borodescu [00:27:44] I mean, generally speaking, from what I’ve seen other companies or startups are building, there are cross-functional teams, not only data scientists or machine learning engineers but also back-end developers. And not only engineers but also business people. You need domain knowledge and then not only business and engineering, but you also need legal. Like you said, probably on ethics, especially an AI ethics team should definitely have a philosophical mind in there somewhere. I want to change gears a bit. Can you walk us through your fundraising experience the past year? What were your highs and lows during those VC conversations? 

Raluca Crisan [00:28:46] We’re not really at the VC stage, we’re more at small funds, angel funds. The key is understanding what they’re expecting and a little bit of talking their language. I think that was another revelation for me, that they just care about certain things and it’s better for everyone if the information gets sent as quickly as possible. 

Raluca Crisan [00:29:14] That was a learning process and I was surprised how nice everyone was. I think everybody is very nice in this profession. It’s very transactional in a good way. So I was surprised by that. It’s a sales process that’s not really in my bag, but it’s useful to understand. 

Raluca Crisan [00:29:46] I think it’s also useful to hear everyone’s feedback. And some of the angels, actually, they’re quite savvy when it comes to technology. And it’s very nice to see that perspective of the market as well. And some of the questions they ask are really making you think about what kind of product is also. So that’s good. 

Ciprian Borodescu [00:30:04] And I know that you guys went through an accelerator, was it in the Netherlands? 

Raluca Crisan [00:30:11] They’re a stakeholder in us, but I think they are pretty hands-on. I think they were more hands-on, in terms of guiding us, but they didn’t take part in any decisions. Yeah, I don’t know, what was your experience? I’m just curious.

Ciprian Borodescu [00:30:39] For me, I think the world is divided into accelerators and accelerators. 

Ciprian Borodescu [00:30:45] And like I mentioned, you have the top three, which are obviously Y Combinator, TechStars, maybe 500startups, and then the rest of the startup accelerators. And for me, my first accelerator was back in 2010 in Copenhagen. It was Startupbootcamp. And back then, for me personally, it was a life-changing experience because I was a developer, I was a software engineer, and all I did was to write code all day long. And think in technical terms and all of the sudden I was in a position to pitch; I didn’t even know what pitching means. We’re talking about 10 years ago. 

Ciprian Borodescu [00:31:33] Today, everybody knows it, because entrepreneurship gets a lot of attention. But back then it was not like that, especially for me or for my team. We were coming from Romania, where the entrepreneurial ecosystem 10 years ago was really, really underdeveloped. So that was the context. And of course, we understood then what it really means to build a tech startup, or a product. Until that point, we were just offering our services to whoever wanted to build a website, a web platform, or a mobile application, so we were pure engineers. 

Ciprian Borodescu [00:32:22] Then we went through another accelerator with a different startup in 2016 in US, and that was a women-led accelerator. That’s where my co-founder, Alexandra (CTO at that time) was in the front row. For me, it was a little bit difficult because as a CEO, I was the one in the front row, pitching, selling, but now Alexandra was the one forced out her comfort zone. That was an interesting experience for her, because after three, four months at that accelerator, she became or she transformed into a more complete person. Yes, still a technical one, but also acquiring some business knowledge as well: how to talk to investors, how to talk to team members, customers in general, and how to think about building tech products – not just code or engineering stuff. Kind of like understanding the other side of the table. And that’s useful, especially if you’re a founder. 

Ciprian Borodescu [00:33:41] And then, everything culminated with us being selected, selected at Techstars Montreal AI with MorphL. And honestly, I mean, it’s a huge difference between Techstars and the rest of the accelerators in that the power of that accelerator really is in the network of mentors. The mentors are extremely helpful. There’s a huge network and it really changed again, if it’s possible, our life. It’s was a once in a lifetime opportunity to be selected to Techstars. 

Ciprian Borodescu [00:34:21] Why? 

Ciprian Borodescu [00:34:23] Because once you’re there, you’re part of a global network and you are one, two, or three steps away from being connected with a very important lawyer, an experienced one, or an experienced CEO, and so on and so forth. I mean, for example, for us during the acquisition process we went through with Algolia, we did use the Techstars network a lot. So I think in general, any accelerator is a must have experience for any startup. 

Ciprian Borodescu [00:35:06] But then again, once you are mature enough, I think, as a founding team you can decide whether to pursue a second or third accelerator, at which point or at what level of expertise. But, yeah, that was our experience. 

Raluca Crisan [00:35:34] I guess you guys didn’t go through a company builder type of thing? 

Ciprian Borodescu [00:35:38] No, we didn’t! 

Raluca Crisan [00:35:41] I just want to see your opinion on accelerators – it seems like they’re everywhere in London.

Ciprian Borodescu [00:35:59] Now, for the final section of the podcast, Lightning Questions and Answers, a series of short fun questions that you have to answer really, really fast. Ready? 

Raluca Crisan [00:36:10] Yeah, yeah. 

Ciprian Borodescu [00:36:12] Favorite movie. 

Raluca Crisan [00:36:15] Lost in translation. 

Ciprian Borodescu [00:36:17] Oh, OK, nice one. Cats or dogs?

Raluca Crisan [00:36:23] I do have a cat, so. 

Ciprian Borodescu [00:36:25] Oh, that’s nice. 

Raluca Crisan [00:36:28] I like it, but it’s a hard one because dogs are great but tricky. 

Ciprian Borodescu [00:36:33] Favorite book? 

Raluca Crisan [00:36:37] Augustine’s Confessions. 

Ciprian Borodescu [00:36:40] OK. 

Raluca Crisan [00:36:41] It’s just a really good book. 

Ciprian Borodescu [00:36:43] London or Bucharest?

Raluca Crisan [00:36:45] Yeah, I’m in London. I’m from Arad, not lived in Bucharest, so London. 

Ciprian Borodescu [00:36:53] And the last, bonus question. I know you have a toddler back home. How would you explain AI, not ethics, to your toddler in very basic words and maybe not now, but in a few short years? 

Raluca Crisan [00:37:07] She’s almost three now. 

Ciprian Borodescu [00:37:10] Did you try to explain what mom does for a living? 

Raluca Crisan [00:37:18]  It’s more something to maybe entice her to it a little bit. Ask her to think about how is it when she has to learn the new words? And that’s kind of hard to do, I think, for anyone, but especially for like a small child. 

Raluca Crisan [00:37:35] You know, try to pinpoint some sort of, self-awareness of these learning processes and then say this is basically a little bit like AI. Not really how it works, but what we’re trying to do. We’re building these things. Kind of making like a small Sarah, learning her words. I think that she would really like a robot. I would give her a robot and she will get probably understand some of it.

Ciprian Borodescu [00:38:17] Yes, exactly. It’s interesting – now we’re talking about this, but if you remember 10, 20 years ago, we didn’t have a smartphone and now everybody’s playing with smartphones. And even one, two, three years old kids – it’s so intuitive for them and there is no explanation needed and it’s insane. Who knows what will be 20 years from now when it comes to AI. Probably you will have toddlers that are being raised by an AI nanny … who knows! 

Raluca Crisan [00:39:01] I’m sure someone, somewhere is already building it. 

Ciprian Borodescu [00:39:10]  I feel this conversation might be another episode. 

Raluca Crisan [00:39:13] Yeah. 

Ciprian Borodescu [00:39:16] It was a pleasure to have you on this podcast. Thank you so much for sharing your wisdom with me and with us. How can people reach out to you for ideas and comments? 

Raluca Crisan [00:39:26] I’m very much on LinkedIn, so that’s probably a good way to find me. I’d love to hear your thoughts on any bias and ethics in general. 

Ciprian Borodescu [00:39:49] Awesome, thank you so much Raluca. 

Raluca Crisan [00:39:53] Thank you. 

Ciprian Borodescu [00:39:58] All right. This was the Get Your AI On! podcast. Thank you all for listening and be sure to subscribe. We’re going to post a new episode every other week, so stay tuned for the next conversation. See you next time!