Hadoop Distributed File System (HDFS)

The Hadoop Distributed File System (HDFS) is a distributed file system which is intended to consistently and effectively store enormous volumes of data throughout a cluster of commodity hardware. It is an essential part of the big data processing and analysis ecosystem known as Apache Hadoop. High throughput, scalability, and fault tolerance are all considered in the design of the HDFS architecture.

Where to use HDFS

Very Large Files: Files should be of hundreds of megabytes, gigabytes or more.
Streaming Data Access: The time to read the whole data set is more important than latency in reading the first. HDFS is built on a write-once and read-many-times pattern.
Commodity Hardware:It works on low cost hardware. Where not to use HDFS
Low Latency data access: Applications that require very less time to access the first data should not use HDFS as it is giving importance to whole data rather than time to fetch the first record.
Lots Of Small Files:The name node contains the metadata of files in memory and if the files are small in size it takes a lot of memory for the name node's memory which is not feasible.
Multiple Writes:It should not be used when we have to write multiple times.

Community

Join WhatsApp Grpup using https://chat.whatsapp.com/EAcqRurEOXb52Ax7Tlmj9I

Hadoop Distributed File System (HDFS)

Where to use HDFS

Happy Exploring!

No comments:

Post a Comment

About BunksAllowed

Coding Challenges

Socialize

Categories

Followers

BunksAllowed

Comments

Subscribe To

Total Pageviews

Blog Archive

Categories

Recent Posts

Popular Posts

Subscribe Us

Quick Contact

Translate

Archive

Follow Us

We Acknowledge

PEXELS

Recent Tutorials

Contact Form

Categories