One of the reasons identify theft is considered by the Treasury Inspector General for Tax Administration to be the crime of the century is because of the IRS. The Internal Revenue Service makes growing demands for information about people’s businesses and private lives every day. There is no such thing as personal privacy these days. That the IRS sends citizens a so-called “Privacy Act Notice” in all its mailings is a farce. The IRS lays claim to your data without court authority more so than any other government agency. And to make matters worse, they share the data with any other federal, state or local government agency claiming an interest, including foreign governments.
A river of data
In 2019, there will be about 152 million individual tax returns filed with the IRS. There will be roughly another 100 million business tax returns filed. There will be millions more miscellaneous tax returns, including trust, estate and gift tax returns. On top of that, over 3.6 BILLION information returns (Forms W-2, 1099, etc.) will be filed. There is quite literally a river of data flowing into the agency. The flow cannot be stopped, and as far as the IRS is concerned, they need even more.
For example, one of the six “Strategic Goals” presented in the IRS’ 2018-2022 Strategic Plan is to increase its access to data, and use that data more effectively to drive its agency-wide decision making, as well as case evaluations and selections for enforcement purposes. See: IRS Publication 3744 (4-2018). This is consistent with the IRS goal of becoming a “data driven agency.”
The IRS is awash in data. The 2018-2022 Strategic Plan boasts that the IRS’ volume of data was 100 times larger in 2017 than it was 10 years prior. In 2018, the IRS Criminal Investigation unit alone collected 1.67 terabytes of data from various sources. A terabyte is 1,099,511,627,776 bytes, or 1,024 gigabytes of data. I’m told that approximately 900,000 plain text files can fit into a single gigabyte. The number of users in the IRS with access to that data has increased 23 times (Strategic Plan, p. 19) in the past 10 years.
Managing massive data
How do you manage, process and assimilate such a massive amount of data to the point where it becomes usable? The 2018-2022 Strategic Plan expresses the goal to “invest in analytics and visualization software and tools, and develop processes to support analytics in IRS operations” (p. 20). The end game is presented in these words:
Advancements in how data is collected, stored, accessed and analyzed will allow us to deploy data better. We’ll standardize our data processes and protocols and encourage collaboration among all IRS business units. Increased interoperability of data systems and sources will enhance the secure and seamless flow of data to enable greater authorized access to information. We’ll invest in training to develop more advanced analytics skill sets across the IRS, and use data to improve our business processes. (Strategic Plan, p. 19.)
The investment in analytics was recently undertaken – in a big way.
Big Government, meet Big Data
On Sept. 27, 2018, the IRS entered into a contract with Palantir Technologies of Palo Alto, California, to handle the task of data assimilation. The contract calls for Palantir to provide hardware, software and training to IRS employees to “capture, curate, store, search, share, transfer, perform deconfliction, analyze and visualize large amounts of disparate structured and unstructured data.” (IRS Contract Proposal, Performance Work Statement, Jan. 11, 2017, p. 1.)