Calculating Data Risk (part 2): Data Sprawl via Email

Alex Panagides
DataDrivenInvestor
Published in
6 min readNov 23, 2020

--

We continue our data-driven exploration of the unseen risks of email using mxHero’s Calculator, this time examining data risk resulting from email content sprawl.

Second in a series showcasing results of the mxHero ROI Calculator

In our last posting, we took a closer look at the mxHero ROI Calculator’s estimate of accidental file deliveries resulting from emails with address errors. This article focuses on one of the most startling estimates, namely data sprawl caused by email. As we have written before, email’s architecture duplicates content with viral efficiency. This is because email and its embedded file attachments are replicated at every stage in the delivery chain. This duplication is automatically executed for all those addressed in the email — the sender plus everyone in To, Cc, and Bcc.

Email effectively guarantees that company IP, trade secrets, or regulated user data are easily available — whether from a remote employee’s mobile device, workstation, laptop or even on a supplier’s email server.

When email is used to send files, the data sprawl implications are striking. For everyone involved (addressed) in an email, a full copy is created and duplicated many times during delivery. From the outset, a copy is placed in the sender’s outbox. Each recipient gets a copy in their inbox. For the sender and each recipient, copies will also exist in their respective email systems, typically across redundant email servers. Furthermore, new copies are created as emails are archived. Finally, when recipients save files from the emails, more copies are created on the local device — which in today’s world often means potentially multiple devices, whether laptop, mobile phone, or other.

Files sent through email are duplicated at multiple stages. Copies are created when emails are stored to redundant email servers and archives, sent to external systems (e.g., CRM), received by the recipients in To, Cc, and Bcc, and when saved to potentially multiple devices.

As a result, when accounting for the many points of data duplication, the mxHero Calculator reveals that even for a small organization of 10 users, the amount of files duplicated per year as a result of email attachments is a shocking 397,664 copies. This result assumes no server or archive redundancy (which is atypical). For larger organizations, the estimates are staggering. One of our clients is an enterprise of 20,000 employees, which calculates to an unbelievable 795,305,190 file duplicates being created per year due to email attachment use. It’s no wonder that organizations can’t keep their sensitive data under control.

Methodology

The Calculator considers the following key data points: the number of employees multiplied by the number of email servers plus email archives to account for message copies in the email infrastructure. In the above calculation, we assume only one email server and one archive, which, as mentioned, is atypical. Usually, organizations employ redundancy. This means that already, the estimated duplication underestimates the reality. With the number of users and machines, we have a rough (underestimate) number of email copies. Now we need to get the estimated number of files that are contained in those emails. For this, we use a finding from a study by Radicati, namely, 53 emails sent and received with attachments per worker per day and multiply it by the estimated number of files per email of 1.2. Finally, we estimate the number of files that are saved to the user’s local device. For example, the user receives an excel file through email. They download the excel file and save it to their working folder.

The above methodology does not account for the copies of files that are downloaded to the user’s email client (e.g., Outlook, not OWA or GSuite because those clients are in the cloud). It only accounts for copies in the email server and archives. Nor does it assume that the user might download a file to more than one device, for example, a mobile device and a laptop. If we adjust the model and take into account email copies sitting in the user’s computer email client (e.g., Outlook) and assume that 10% of files get downloaded to an alternative device, the number of files duplicated per year as a result of email attachments for a 100 person company jumps from 3,978,493 to 5,690,999, or almost 50% more. Unfortunately, the data sprawl conservatively estimated by the Calculator is probably much worse in reality.

Conservatively, we can calculate the amount of file duplication caused by email by multiplying the number of employees by the sum of email servers, archives, and local email clients times the number of files sent in email per employee plus the number of files saved from emails to local devices.

Why email data sprawl is such a big deal

Cybersecurity is recognized as the single greatest threat to the world economy over the next decade. More and more resources are being diverted into digital security, away from productive and socially imperative investments like education, health, and research. Despite the increase in security investments, the cybersecurity situation continues to deteriorate. As we have shown here, file sharing through email attachments creates a massive threat surface. This surface is so broad, wide, and deep that all attempts at better security are futile. Files sent through email is a particularly pernicious challenge. As a medium of file exchange, email’s 50-year-old design leaves data completely unprotected, widely dispersed, uncontrolled, and irrevocable. A breach of an email system or a user’s inbox not only goes unnoticed but renders unrestricted access to all its contents. Email effectively guarantees that company IP, trade secrets, or regulated user data are easily available — whether from a remote employee’s mobile device, workstation, laptop, or even on a supplier’s email server. This data is fodder used and reused over a long horizon of cybercriminal attacks — data that is valuable and traded on the dark web. Additionally, needless duplication of data incurs high costs beyond regulatory fines, reputational damage, and lost IP. Email data duplication costs in storage infrastructure and even carbon emissions, subjects we will visit in our next articles.

In our increasingly digital world, accelerated by a pandemic, remote collaboration is the norm for the foreseeable future. As the organization disperses its workforce geographically, the need to collaborate digitally intensifies. The rampant data proliferation model of email, purpose-built to ensure data delivery during an age when connectivity was uncertain, is wholly inadequate for today’s reality of data insecurity understood as over-accessibility. Moving to modern file sharing, such as secure cloud storage links (e.g., Box, OneDrive, etc.), is paramount. Sharing files as links to a single copy in a single repository avoids unnecessary data replication while providing the access restrictions and visibility missing from the legacy email attachment model. Technologies like mxHero smooth the transition from email attachments to cloud storage links without requiring end-user effort.

This graph compares annual file duplication resulting from the use of email attachments vs. cloud storage file links (e.g. Box, OneDrive, etc.) for an organization with 100 users. Technologies like mxHero, automatically ensure that all attachments are shared as cloud storage links. The result is a dramatic reduction in data sprawl and concomitant data risk. Source: mxHero ROI Calculator

Unseen dangers

Often the things that most undermine us are the ones that we don’t see. Such is the case with email. Email has long been so intrinsic to our work process that, like the plumbing, we don’t really know what goes on behind the walls. It is for this reason that we built the mxHero Calculator - to help organizations peer through their own walls and into the plumbing … there’s the leak.

Alex Panagides
https://mxhero.com

Alex Panagides is a well-known email technology pioneer and the founder and chief executive officer of mxHero, a Silicon Valley start-up providing cutting-edge solutions to support and enhance email for all, Alex launched mxHero in 2012 alongside a highly skilled team to improve email issues that companies face on a day-to-day basis, such as the increased volume and size of emails, virus and security threats, and global accessibility. The mxHero team is continually innovating to address email and data storage challenges for businesses and individuals. Alex had previously co-founded one of today’s leading email technology companies in Brazil, Inova International Inc. that grew to serve government agencies, telecom providers, and multi-nationals among other organizations in the region. In addition to his work as an IT specialist with a mind for solving real-world problems related to email pain points, Alex has also served as a consultant to the World Bank in Washington D.C. and Brazil. In all, Alex brings more than 25 years of technical, operational, and managerial leadership and vision to mxHero establishing partnerships with today’s leading companies including Google, Box, Dropbox, Microsoft, and Citrix.

Gain Access to Expert View — Subscribe to DDI Intel

--

--