Big data, a general term for the massive amount of digital data being collected from all sorts of sources, is too large, raw, or unstructured for analysis through conventional relational database techniques. Almost 90% of the world's data today was generated during the past two years, with 2.5 quintillion bytes of data added each day. Moreover, approximately 90% of it is unstructured. Still, the overwhelming amount of big data from the Web and the cloud offers new opportunities for discovery, value creation, and rich business intelligence for decision support in any organization. Big data also means new challenges involving complexity, security, and risks to privacy, as well as a need for new technology and human skills. Big data is redefining the landscape of data management, from extract, transform, and load, or ETL, processes to new technologies (such as Hadoop) for cleansing and organizing unstructured data in big-data applications.
Although the business sector is leading big-data-application development, the public sector has begun to derive insight to help support decision making in real time from fast-growing in-motion data from multiple sources, including the Web, biological and industrial sensors, video, email, and social communications. Many white papers, journal articles, and business reports have proposed ways governments can use big data to help them serve their citizens and overcome national challenges (such as rising health care costs, job creation, natural disasters, and terrorism). There is also some skepticism as to whether it can actually improve government operations, as governments must develop new capabilities and adopt new technologies (such as Hadoop and NoSQL) to transform it into information through data organization and analytics. An additional big data security challenge is that big data programming tools, including Hadoop and NoSQL databases, were not originally designed with security in mind. For example, Hadoop originally didn’t authenticate services or users, and didn’t encrypt data that’s transmitted between nodes in the environment. This creates vulnerabilities for authentication and network security. NoSQL databases lack some of the security features provided by traditional databases, such as role-based access control. The advantage of NoSQL is that it allows for the flexibility to include new data types on the fly, but defining security policies for this new data is not straightforward with these technologies. So what can be done to help bring the security of traditional database management to big data? Several organizations describe and define different security controls.
How to Secure Big Data
Application Software Security. Use secure versions of open-source software. As described above, big data technologies weren’t originally designed with security in mind. Using open-source technologies like Apache, Accumulo or the .20.20x version of Hadoop or above can help address this challenge. In addition, proprietary technologies like Cloudera Sentry or DataStax Enterprise offer enhanced security at the application layer. Specifically, Sentry and Accumulo also support role-based access control to enhance security for NoSQL databases.
Maintenance, Monitoring, and Analysis of Audit Logs. Implement audit logging technologies to understand and monitor big data clusters. Technologies like Apache Oozie can help implement this feature. Keep in mind that security engineers in the organization need to be tasked with examining and monitoring these files. It’s important to ensure that auditing, maintaining, and analyzing logs are done consistently across the enterprise.
Secure Configurations for Hardware and Software. Build servers based on secure images for all systems in your organization’s big data architecture. Ensure patching is up to date on these machines and that administrative privileges are limited to a small number of users. Use automation frameworks, like Puppet, to automate system configuration and ensure that all big data servers in the enterprise are uniform and secure.
Account Monitoring and Control. Manage accounts for big data users. Require strong passwords, deactivate inactive accounts, and impose a maximum permitted number of failed log-in attempts to help stop attacks from getting access to a cluster. It’s important to note that the enemy isn’t always outside of the organization. Monitoring account access can help reduce the probability of a successful compromise from the inside.
Organizations that are serious about big data security should consider these first steps. Cyber criminals are never going to stop being on the offensive, and with such a big target to protect, it is prudent for any enterprise utilizing big data technologies to be as proactive as possible in securing its data.