What is HDFS?
Where to use HDFS
Very Large Files: Files should be of hundreds of megabytes, gigabytes or more.
Streaming Data Access: The time to read the whole data set is more important than latency in reading the first. HDFS is built on a write-once and read-many-times pattern.
Commodity Hardware: It works on low cost hardware.
Where not to use HDFS
Low Latency data access: Applications that require very less time to access the first data should not use HDFS as it is giving importance to whole data rather than time to fetch the first record.
Lots Of Small Files: The name node contains the metadata of files in memory and if the files are small in size it takes a lot of memory for the name node's memory which is not feasible.
Multiple Writes: It should not be used when we have to write multiple times.
Key Characteristics:
- Scalability: HDFS is designed to scale horizontally, allowing the addition of more nodes to handle growing amounts of data.
- Fault Tolerance: HDFS ensures data durability by replicating each block of data across multiple nodes. If a node fails, data can still be retrieved from replicas.
- Data Locality: HDFS aims to move computation close to data by storing data on the same nodes where computation is likely to occur. This reduces data transfer time.
HDFS Architecture: Components of Hadoop Distributed File System
Data Replication:
HDFS Operations:
Write Operation:
- Client Request: The client communicates with the NameNode to create a new file.
- Block Allocation: The NameNode allocates data blocks and provides a list of DataNodes to the client.
- Write Data: The client writes the data directly to the identified DataNodes.
- Block Replication: Data is replicated across multiple DataNodes for fault tolerance.
Read Operation:
- Client Request: - The client communicates with the NameNode to read a file.
- Block Location: - The NameNode provides the client with the locations of the required data blocks.
- Read Data: - The client reads data directly from the identified DataNodes.
HDFS Commands:
hadoop fs -copyFromLocal local_file_path hdfs://namenode_address/hdfs_file_path
hadoop fs -ls hdfs://namenode_address/hdfs_directory_path
hadoop fs -mkdir hdfs://namenode_address/hdfs_directory_path
hadoop fs -cat hdfs://namenode_address/hdfs_file_path
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.