EU Chat Control is Worse than I Thought...

EU Chat Control is Worse than I Thought...

It is often said that Chat Control is a threat to our privacy, as it makes end-to-end encryption illegal and allows governments to check all messages shared within the EU. This is false, or at least misleading. The Chat Control law is, indeed, a serious privacy threat, but to understand why we need dive into the complexity of the law. In this video I will first give some context about the law and its timeline, then I will explain why it's actually really easy to think that it's not a big issue, then I will explain why - yes - it is a terrible law, and finally I'll explain how we're going to save our privacy. Let's start.

Context

The story begins in 2020. Europe notices a significant increase in child sexual abuse material evidence; in fact, just between 2020 and 2021, one year later, online sexual exploitation cases raise by a factor of ten and child sexual abuse Internet videos by 40%. They immediately go ahead and approve a so-called "temporary" legislation to allow chat, message and email service providers to check messages for "illegal depictions of minors and attempted initiation of contacts with minors"; immediately, services like Facebook, Messenger and Gmail start looking for evidence of child sexual abuse material. This is an opt-in legislation, that is: it only gave permission to those companies to search through the messages for that kind of imagery, but it did not make it required by law. One year later, 2021, the European Parliament approves a legislation called ePrivacy Derogation - no longer temporary - that, again, allows message and email providers to search for child sexual abuse content.

It's now necessary to understand how Facebook and other companies are checking for this kind of content. A great example is Microsoft. Obviously, having a large public database of child sexual imagery to check for matches would be insane, so Microsoft designed a tool called PhotoDNA that relies on a hash function; an hash function takes an image and gives you a number. It should be very easy to find this number, but it should instead be impossible, given the number, to go back to the image. This way, everytime police identifies a sexually explicit child video or photo, only the hash of that content will be in the database; and when you send a message over ... Discord, as an example, the hash of the message is checked with the hash of the illegal content. And, yes, Discord does this for every single picture you send, it's not a random example.

Hash functions are usually not unique, let's say, if you have different inputs, like two different pictures, it might happen that they have the same output, or the same hash; this is part of why it's so hard to take a hash and go back to the original image, since there's no one original image, but rather many different possible ones. However, it is extremely unlikely that two different images would have the same hash in practice, so we're safe from false positives when comparing hashes. Now, usually it is a requirement for hash functions that changing the input slightly will result in wildly different outputs; this makes sure that it's really difficult for two similar things to accidentally have the same hash. However, such an hash function would be wildly useless in our scenario, as it would mean that changing the image slightly would result in completely different hashes; you would have to change just one pixel to be safe from these automated controls. Because of that, PhonoDNA and similar tools use a different kind of hashing function, called perceptual hash, which actually gives similar hashes to similar images. This will be particularly important later on.

One year later, in 2022, the European Commission decides that they would like to require by law to have this kind of control that already exists on many platforms; this proposal is what is now called the Chat Control legislation, and it actually introduces much more than just making the current checks mandatory. The Council has had six months of workshops to define an initial draft, which might be different compared to the one proposed by the Commission; group of the European Parliment called LIBE (a Committee on Civili Liberties, Justice and Home Affairs) also started working on its own draft. After this stage, we will have the "trilogue" discussion that will bring the three different drafts back into one. The cool thing is that we're very close to a Parliment draft, apparently we have an agreement already, and the vote will be in just a few days, on the 13th of November.

Just one thing, thoguh: this video is not sponsored. Nobody pays for it, and yet, it took one week - one week - to research and write, now I'm g it with some quite expensive equipment, and then the editor is gonna spend days and days editing it. Ad renvenue alone isn't nearly enough to cover all the expenses, I'll probably make around thirty bucks? fifty, if I'm lucky. The only reason this video exist is your donations. Above my head you should now see a progress bar with the money I need monthly to keep this whole thing running, and how much I currently have. I really want to thank everybody who donated, and if you'd like to, I've got paypal, liberapay, ko-fi, patreon, and so on. Anything is appreciated.

Without further ado, let's start talking about Chat Control.

Chat Control, A Good Idea?

First of all, all chat, message  or email services that do not get any remuneration (even from advertisement) are affected by this legislation; this means that open source projects such as KDE's matrix client, NeoChat, are certainly safe and won't be changed at all. Note, however, that this legislation does apply to services that are not based in Europe but still operate here, such as... Telegram, Whatsapp, and so on. This legislation includes telephony, e-mail, messanger, chats; this includes videogames with in-game chats and videoconferencing software.

Just in case you don't know, almost all chat software require a middleman that takes your messages and delivers them to your target person, so I give Telegram the message, and Telegram gives my girlfriend the message. Lots of chat services encrypt the message in these two steps to make sure nobody can read them, so I encrypt, and then send to Telegram, Telegram decrypts the message, then encrypts it differently and gives it to my girlfriend. This means that Telegram can actually read my messages and my images, and the same goes for platform such as Discord; this is how they're able to check your picture with PhotoDNA, they are able to see them.

However, some other chat services such as Whatsapps actually have a different type of encryption, called end-to-end encryption, where only the recipient of the message can decrypt it. So, I encrypt my message, I give it to Whatsapp, Whatsapp cannot decrypt it, and they can only give it to my girlfriend as it is, and only she can actually read it. This allows for a much higher standard of privacy, and highly recommended to say the least.

So far, all systems using PhotoDNA were not end-to-end encrypted; however, the draft legislation says that we should instead check for end-to-end encrypted messages as well using similar techniques. However fear not, because this is actually possible. Basically, instead of having Whatsapp check the hash of your messages, because they can't, this hashing would be done directly on my device when I send the message. There are two ways to do that.

First one: After I click send, my phone hashes the message and sends the hash to Whatsapp. This is safe, because they cannot know what your message is based on your hash, they only know the hash; and now that they do have it, they just check it through the database. Second way: when I install Whatsapp, the application automatically downloads on my phone the database of hashes; after I hit send my phone hashes the message and directly checks it through the dabate without even sending any data to Whatsapp, at all.

On top of all of this, the Commission wants companies to check for Child Explicit Content even through hosting websites, which includes web hosting, social media, video streaming services, and cloud services. In fact, Google Photo already scans your pictures for child pornography, this shouldn't come as a surprise, but this legislation would make it mandatory for companies like Nextcloud as well; or, rather, companies using a Nextcloud install. I think.

However, the Commission also decided to bring all of this to the next step. This method I've talked about, PhonoDNA, only allows to search through images for known Child Sexually Explicit Content; instead, the Commission would like for it to be mandatory to search for unknown content as well. From their point of view, the move is pretty obvious I would say, and in practice it would involve a machine learning-based application to automatically flag content as problematic. Again, in the context of end-to-end encryption all of this would have to be done directly on your phone, to make sure that you're not sending your data around. Not only images, but the legislation would also like clients to check for grooming, meaning that all text messages should also go through a text-based machine learning model that would automatically flag grooming.

How reliable are these automated tools? Again, Microsoft has one and they report an accuracy of 88%, meaning that out of one hundred conversations flagged as grooming, only 12 are false positives. Note that these are manually filtered out and Microsoft will not contact police enforcement purely based on the automated tool.

Finally, there is a requirement on app stores such as Google Play or Apple Whatever to verify the age of users and block children (age 16 and under) from applications that could be misused for grooming purposes; I will note here that I was unable to understand, at the time of writing, the exact meaning of this age verification. Some sources expect a very hard and formal age requirement, which would require all Google Play users to identify themselves; however, other sources think this could also refer to a lighter kind of verification, though I'm not sure what would that be. I will get back to this later on in the video.

Nonetheless, this is a very brief explaination of all the things that are contained in the Chat Control proposal by the European Commission. As you can see, it seems like end-to-end encryption is garanteed, all data processing is done directly on your device, and privacy is safe. However, the situation is not as good as it might seem, as we're about to discover. And really, this should come as a reminder that understanding policies is always extremely complex.

Chat Control, A Bad Idea!

Let's immediately start speaking about end-to-end encryption and this idea of looking for hashes directly on your device, or even sending those hashes to a third party. This concept is called "Client Side Scanning", CSS, and has some major drawbacks. If the hash lookup is done on your device, there's obvious technical challenges; your phone would have to store the hashes of all known child sexual abuse media; this won't be an issue from a legal point of view because, again, it is impossible to restore the original content after you hashed it, but it does require your phone to store quite a big database of hashes, which has to be constantly kept up to date. On top of that, if the hashing is done only on your device, you have to make it very hard for users to disable that component, by, I don't know, deleting some files or killing some processes. Not that, but you get the idea.

However, sending your hash to anybody has a much bigger privacy drawback. Let's say I write a message that says "I hate the Italian government"; my phone will turn that into a hash and send it to Whatsapp. They won't be able, in any way, to restore my initial message from the hash… unless they guess it. What they can do is say, hey, I think we should check for everybody saying "I hate the Italian government". They can take the hash of that wording, and check the resulting hash on all of their messages. My message would have the same hash, and they would be able to know that I sent that exact message. Now, it is insane to check all possible combinations of text messages with this technique, meaning that - yes - the vast majority of messages will never be understendable to Whatsapp; but they can still track for certain messages that they would like to know about.

And, really, this is working as intended; this whole hashing this is supposed to make it easy for Whatsapp to search all messages for a certain image or text that they decide. The Chat Control legislation says that this should only be used to search for child sexual abuse material, but obviously this provides no garantee that our privacy is safe, because it is technically feasible for them to search any kind of text or image, not just that. If you follow politics, it is very easy to see how we should never allow the existance of tools that can, technologically, be used as a citizied-tracking device, because the legislation said "tracking users is illegal". This is because the very first authoritarian government might want to extend that from child abuse to any kind of sexual abuse material, such as revenge porn, that's a good cause right? and then, that might be extended to all illegal material, and then to all anti-governmental material, and so on. It's really not that hard to find a government that would be more than happy to use this kind of tools to search for opposition messages; just in Europe we have governments that are slowly degrating their democracy, such as Belarus. But even the Italin government has made it clear that they will not tolerate any kind of public anti-government ideas.

What if the hashing is done on your device, then? Aside from the technological issues I've talked about, there's a couple of big challenges. The first one is: what happens if your device finds that you are trying to send child sexual abuse material? What the Chat Control legislation would like to happen is that the nasty messages would have to be shared publicly with a relevant authority that would review them and, potentially, open up an investigation on you. This has a small issue, which is false positives. Sometimes, the hashing doesn't hash hashingly and accidentally you get flagged for child abuse. Happens. Keep in mind we're talking about billion of messages everyday. In those scenarios, some of your messages, sometimes, might be accidentally be shared with a third party, that would be able to review them. This is a privacy concern. There's also another much bigger challenge: as I've sad before, you can not retrieve the content from its hash. This means that you can not retrieve child sexual abuse videos from the database of hashes you're given, which is great, but it also means that we have no clue whatsoever about what's actually in the database. If the entity that puts the database together decides to add some images that are not child sexual abuse content, such as revenge porn images, we would literally have no way of knowing that. The entity that works on this database would obviously be quite opaque, because the alternative is, they would have to publish every piece of child sexual abuse video that they find, which they can't do. So, we would simply have to trust them, blindly, that they are not searching our messages for anything except child pornography. Even though they can. And we wouldn't know.

Even if the entity is extremely trusted and has some mechanism in place to make sure this won't even happen, this database thing adds a pretty big attack surface for hackers. As I said, the database would have to be constantly be kept up to date. If somebody wanted to read some of my end-to-end encrypted messages, they could try to modify this database, either pretenting to be the above entity and sending a fake update, or they could directly modify the database on my device if they get access to it. If that happens, and the database is compromised in any way, then my device would start sending some of my private messages to the above mentioned external authority, instead of sending them end-to-end encrypted to the person I'm talking to. Sure, we can try as hard as possible to make sure this is not feasible with integrity checks for the database and such, it's just all about the surface area that bad actors can use to attack us.

All of this assumes that the hashing function works as described. The funny thing is, though, it doesn't, actually. It's pretty easy to write a normal hash function as I've described it, but remember we are not talking about normal hash, we are talking about perceptual hash, which requires small changes in input to be small changes in output, so that we can check for similar images and not just perfect matches. The cool thing is that an open letter by hundred of university teachers clearly state that, even after twenty years of research, no such function exist. Quite the opposite: we have shown that all perceptual hashing functions we know of today, won't work.

This is because it is virtually always possible to make a small change in a image that will result in a big change of the hash, meaning that bad actors can always slightly change the image in a certain way and go unnoticed. On top of that, it is always possible to create an image that looks completely normal and yet has the same hash of a child pornography image, meaning we can actually generate images that we are certain will result in false positives. It's pretty easy to see the issue here: a bad actor might generate such an image and use it to frame an innocent, or flood law enforcement agencies with false positives. Both of this attacks have been achieved, successfully, against both PhotoDNA and Apple NeuralHash.

(Wait, what's NeuralHash? Well, basically, Apple tried to implement Client Side Scanning, CSS, and they did actually introduce it saynig that it was the very best. Aaand they actually removed it after a couple of months. So, yeah. And, so far, that was the only time CSS was ever attempted, because PhotoDNA is not used with end-to-end encrypted messages. So, yes, Europe wants to make mandatory to use a technology that only has been attempted once, by Apple, and they failed).

The only way to try to avoid all of those issues is to design a hash function and keep it secret. Just, don't tell anybody how it works. Which, aaaarrgghh, is not going to work, and I really hope you can see why without me explaining it. The last important quote from the open letter is: As scientists, we do not expect that it will be feasible in the next 10 to 20 years to develop a solution that can run on users' devices without leaking illegal information and that can detect known content in a reliable way. That's... quite a statement.

All of this is about the search for known child sexual abuse material. But of course, Chat Control also wants to introdue search of unknown content using machine learning tools. Now, AI tools can be trained to identify certain patterns with high levels of precisions; however, they do routinely make mistakes which makes them a terrible idea once we realize the scale we're talking about. Even scanning the massegs exchanged in Europe for one service provider would mean generating millions of errors everyday; and, even if this machine learning model runs locally on your device, each false positive means, again, that your message won't be private at all, and will be instead shared with external entities, and they might even contact police enforcement because of that false positive. Again, an amazing quote: this cannot be improved through innovation: 'false positives' are a statistical certainty when it comes to AI. Especially when we're talking about applying it to all messages sent in the European Union.

Because of all of this, even though end-to-end encryption is technically preserved even if Chat Control, the CSS represent a privacy risk big enough to put at stake the entire concept of secure communication, because you can never know whether a message of yours will be actually sent to the recipient, or whether it will become a false positive and be instead sent to somebody else, who might contact police enforcement because of it. We cannot allow that to happen.

Honestly it feels like I could end the video here, this is already pretty darn bad, but yeah, there's a couple more things to talk about.

Very quickly: I will point out that another major issue of all of this is money. It is to me slightly unclear who is going to perform the manual check of the messages that were detected through hashes or machine learning tools; this might be done directly withing the companies providing the services, who would have to hire people whose job is to look at child sexual abuse images all day (and we have articles on articles on how difficult and unhealthy those job environments are, especially if there's a lack of proper compensation and mental health services), but it might also be done directly on the law enforcement side as they already have groups designated for it. Regardless, it is pretty clear that this will require an increase in public spending to make sure there's enough staff to go through all of the - potentially, false positives - that were raised out of the billion of messages we send everyday. Right now, 80% of the reports that reach police are criminally irrelevant, and I expect that number to skyrocket if this legislation passes as it is now.

Secondly, we already have reported cases of people wrongly accused of having child pornography; as an example, there are at least two cases of parents taking intimate pictures of their son or daughter to send those images to their doctor during the COVID lockdown. They lost all access to their Google account (e-mails, calendar, even internet connection) and their local police enforcement actually investigated them for months, after which they contacted them to say, hey, we found literally nothing illegal, you just took a picture for the doctor. And yet, Google refused to reinstate their accounts and automatically wiped out all of their data after six months. This is best scenario. We do know of people that were wrongly accused of using child pornography, and who took their lifes.

Thirdly, if the age verification rule ends up to be intended in a somehow strict manner, it will mean that it will be impossible to use any kind of online messaging platform anonimously. Given that app stores have to check the age of their users, they will require some sort of document or ID before being able to access any message platform. We are particularly worried for compulsory identification, the collection of biometric data, and interference with encrypted communication. What could be preferred, as advocated for by Der Kinderschutzbund (insert pronounciation joke here), is that large advertising-financed platform automatically flag children as such, given that we've seen they've been able to identify children and young people as users for a long time. We might also want parents to create smartphone accounts that are specifically flagged as children accounts through the volountary age declaration. This way, parents would be certain that their children won't be able to download any kind dangerous application without the parents' consent.

Please note that I've only scratched the surface of the criticism towards the Chat Control, and I will leave quite a lot of sources in the video description that will tell you more about this proposed legislation. However, we're already pretty deep into the video and I'd like to now start talking about how, luckily, many of the issues with Chat Control are being solved through the European legislative process.

How We'll Defeat This

I talked early on about how there are actually three drafts of Chat Control; one is the original one from the Commission, and then there's a draft from the Parliment and one from the European Council. Then, all three are merged together. Luckily, the drafts of the Parliment and Council appear to be significantly better compared to the Commission one. Let's dive into that.

The Council decided to keep roughly everything except the machine learning part. So all of that, gone.

The Parliment, it's a bit more complex. There seems to be a deal in how the draft will look like, which is great, and - again - the vote about it should just a few days, on the 13th. It is worth noting that part of the LIBE group is Patrick Breyer, who has done an outstanding work of communicating the issues of this proposal to the public. He's part of the Greens/EFA group, and he has been elected through the German Pirate Party in 2019. This is extremely impressive, considering how the Piraty Party is a rather small party, who still manages to be presented at elections it multiple european countries; the best result they achieved was 7.7% in Luxembourg; they managed to elect four people to the european parliment at the latest european elections.

I say this not because they paid me or anything - in fact, I am part of another political party, so I can only lose by talking about them - but rather because, if you care about all of this and want privacy-minded people to be elected and be able to negotiate better legislation, we do have an european election coming up next summer (between the 6th and 9th of June). If you live in Europe, it might be worth it to check whether the Pirate Party will be in your elections, and check whether you think they are offering a good candidate.

Back to Chat Control, though. The draft that, hopefully, will be voted immediately states that all end-to-end messaging services don't have to follow this legislation. This means that CSS, with all of the issue we know it has, wouldn't have to be employed at all. It also asks for telephony and text messages to be removed from the bill (even though messenger services are still there), and it removes the machine learning part but only for grooming, they're still proposing machine learning to be used to detect child sexually explicit content in images. The Parliment draft also removes age verification entirely.

This immediately makes the bill much more sensible from a privacy point of view, especially given that end-to-end encryption is kept out of it. However, even if this draft is accepted, it doesn't mean that this will be reflected in the final draft, especially given that, in my humble opinion, it would become hardly of any use. Even extremely common chat services like Whatsapp are end-to-end encrypted, and it's really easy to have that in Telegram as well, and it's really really easy to download other end-to-end encrypted chat services; I guess that people who use and share child pornography are going to very easily switch to one of those services, or - most likely - just keep using what they were using, given that it's probably end-to-end encrypted already. I'm not saying this because I agree with CSS; but rather, I'm saying that - given how important end-to-end encrypted messages are to the intent of the legislation - I wouldn't be surprised to see the Parliment draft to be ignored in this aspect. We'll see.

I think this is the longest scripted video I've ever done, so again, links in the video description both for sources and for donations to the channels which would be extremely, extremely appreciated. See ya.