When you sign up for a newsletter, make a hotel reservation, or check out online, you probably take for granted that if you mistype your email address three times or change your mind and X out of the page, it doesn’t matter. You can’t do anything until you click the Submit button. Maybe not. As with so many assumptions about the web, this isn’t always the case, according to new research: A surprising number of websites are collecting some or all of your data as you type it into a digital form.
Researchers from KU Leuven, Radboud University, and University of Lausanne crawled and analyzed the top 100,000 websites, looking at scenarios in which a user is visiting a site while in the European Union and visiting a site from the United States. They found that 1,844 websites gathered an EU user’s email address without their consent, and a staggering 2,950 logged a US user’s email in some form. Most sites do not appear to be attempting to log data, but instead use third-party analytics and marketing services that can cause this behavior.
After specifically crawling sites for password leaks in May 2021, the researchers also found 52 websites in which third parties, including the Russian tech giant Yandex, were incidentally collecting password data before submission. The group disclosed their findings to these sites, and all 52 instances have since been resolved.
“If there’s a Submit button on a form, the reasonable expectation is that it does something–that it will submit your data when you click it,” says Gunes Acar, a professor and researcher in Radboud University’s digital security group and one of the leaders of the study. These results were amazing. We thought maybe we were going to find a few hundred websites where your email is collected before you submit, but this exceeded our expectations by far.”
The researchers, who will present their findings at the Usenix security conference in August, say they were inspired to investigate what they call “leaky forms” by media reports, particularly from Gizmodo, about third parties collecting form data regardless of submission status. They point out that, at its core, the behavior is similar to so-called keyloggers, which are typically malicious programs that log everything a target types. But on a mainstream top-1,000 site, users probably won’t expect to have their information keylogged. Researchers observed a variety of these behaviors in practice. While some sites log data keystroke by keystroke only, others grab complete submissions when users click to the next field.
“In some cases, when you click the next field, they collect the previous one, like you click the password field and they collect the email, or you just click anywhere and they collect all the information immediately,” says Asuman Senol, a privacy and identity researcher at KU Leuven and one of the study co-authors. “We didn’t expect to find thousands of websites; and in the US, the numbers are really high, which is interesting.”
The researchers say that the regional differences may be related to companies being more cautious about user tracking, and even potentially integrating with fewer third parties, because of the EU’s General Data Protection Regulation. However, they stress that this is only one possible explanation and that the study did not examine other causes.
Through a substantial effort to notify websites and third parties collecting data in this way, the researchers found that one explanation for some of the unexpected data collection may have to do with the challenge of differentiating a “submit” action from other user actions on certain web pages. However, privacy concerns aside, the researchers stress that this explanation is not sufficient.
Since completing the paper, the group also had a discovery about Meta Pixel and TikTok Pixel, invisible marketing trackers that services embed on their websites to track users across the web and show them ads. Both stated in their documentation that customers could enable “automatic advanced match” which would trigger data collection whenever a user submits a form. However, in practice, researchers discovered that tracking pixels were grabbing encrypted email addresses before submission. This obscured version of emails is used to identify users across different platforms. For US users, 8,438 sites may have been leaking data to Meta, Facebook’s parent company, through pixels, and 7,379 sites may be impacted for EU users. For TikTok Pixel, the group found 154 sites for US users and 147 for EU users.
The researchers filed a bug report with Meta on March 25, and the company quickly assigned an engineer to the case, but the group has not heard an update since. The researchers notified TikTok on April 21–they discovered the TikTok behavior more recently–and have not heard back. Meta and TikTok