Big data, a general term for the massive amount of digital
data being collected from all sorts of sources, is too large, raw, or
unstructured for analysis through conventional relational database techniques.
Almost 90% of the world's data today was generated during the past two years,
with 2.5 quintillion bytes of data added each day. Moreover, approximately 90%
of it is unstructured. Still, the overwhelming amount of big data from the Web
and the cloud offers new opportunities for discovery, value creation, and rich
business intelligence for decision support in any organization. Big data also
means new challenges involving complexity, security, and risks to privacy, as
well as a need for new technology and human skills. Big data is redefining the
landscape of data management, from extract, transform, and load, or ETL,
processes to new technologies (such as Hadoop) for cleansing and organizing
unstructured data in big-data applications.
Although the business sector is leading big-data-application
development, the public sector has begun to derive insight to help support
decision making in real time from fast-growing in-motion data from multiple
sources, including the Web, biological and industrial sensors, video, email,
and social communications. Many white papers, journal articles, and business
reports have proposed ways governments can use big data to help them serve
their citizens and overcome national challenges (such as rising health care
costs, job creation, natural disasters, and terrorism). There is also some
skepticism as to whether it can actually improve government operations, as
governments must develop new capabilities and adopt new technologies (such as
Hadoop and NoSQL) to transform it into information through data organization
and analytics. An additional big data security challenge is that big data
programming tools, including Hadoop and NoSQL databases, were not originally
designed with security in mind. For example, Hadoop originally didn’t
authenticate services or users, and didn’t encrypt data that’s transmitted
between nodes in the environment. This creates vulnerabilities for
authentication and network security. NoSQL databases lack some of the security
features provided by traditional databases, such as role-based access control.
The advantage of NoSQL is that it allows for the flexibility to include new
data types on the fly, but defining security policies for this new data is not
straightforward with these technologies. So what can be done to help bring
the security of traditional database management to big data? Several
organizations describe and define different security controls.
How to Secure Big Data
Application Software Security. Use secure versions of open-source
software. As described above, big data technologies weren’t originally designed
with security in mind. Using open-source technologies like Apache, Accumulo or the .20.20x
version of Hadoop or above can help address this challenge. In addition,
proprietary technologies like Cloudera
Sentry or DataStax
Enterprise offer enhanced security at the application layer.
Specifically, Sentry and Accumulo also support role-based access control to
enhance security for NoSQL databases.
Maintenance, Monitoring, and Analysis of Audit Logs.
Implement audit logging technologies to understand and monitor big data
clusters. Technologies like Apache
Oozie can help implement this feature. Keep in mind that security
engineers in the organization need to be tasked with examining and monitoring
these files. It’s important to ensure that auditing, maintaining, and analyzing
logs are done consistently across the enterprise.
Secure Configurations for Hardware and Software. Build
servers based on secure images for all systems in your organization’s big data
architecture. Ensure patching is up to date on these machines and that
administrative privileges are limited to a small number of users. Use automation
frameworks, like Puppet,
to automate system configuration and ensure that all big data servers in the
enterprise are uniform and secure.
Account Monitoring and Control. Manage accounts for big data
users. Require strong passwords, deactivate inactive accounts, and impose a
maximum permitted number of failed log-in attempts to help stop attacks from
getting access to a cluster. It’s important to note that the enemy isn’t always
outside of the organization. Monitoring account access can help reduce the
probability of a successful compromise from the inside.
Organizations that are serious about big data security should consider these
first steps. Cyber criminals are never going to stop being on the offensive,
and with such a big target to protect, it is prudent for any enterprise
utilizing big data technologies to be as proactive as possible in securing its
data.
Completely agree. Big data security protects the confidentiality and integrity of sensitive data used for analytics and business intelligence.
ReplyDelete