Data Engineering/Hadoop

[Hadoop] Hadoop Streaming

snoony 2024. 3. 7. 17:35

책 내용 파일로 받아오기

[root@localhost hadoop-3.3.6]# wget https://www.gutenberg.org/cache/epub/73116/pg73116.txt

[root@localhost hadoop-3.3.6]# hadoop fs -mkdir -p /user/mapreduce
[root@localhost hadoop-3.3.6]# hadoop fs -ls /user
Found 5 items
drwxr-xr-x   - root supergroup          0 2024-03-05 16:37 /user/hadooptest
drwxr-xr-x   - root supergroup          0 2024-03-06 14:19 /user/hive
drwxr-xr-x   - root supergroup          0 2024-03-06 14:23 /user/kimnayoung
drwxr-xr-x   - root supergroup          0 2024-03-07 16:08 /user/mapreduce
drwxr-xr-x   - root supergroup          0 2024-03-07 12:35 /user/root

받아온 파일 /user/mapreduce/input에 저장하기

[root@localhost hadoop-3.3.6]# mkdir input
root@localhost hadoop-3.3.6]# mv pg73116.txt ./input
[root@localhost hadoop-3.3.6]# ls input
pg73116.txt
[root@localhost hadoop-3.3.6]# hadoop fs -put ./input/ /user/mapreduce
[root@localhost hadoop-3.3.6]# hadoop fs -ls /user/mapreduce/input
Found 1 items
-rw-r--r--   1 root supergroup      47873 2024-03-07 16:20 /user/mapreduce/input/pg73116.txt

Hadoop Streaming

mapreduce 안 input 폴더 확인

[root@localhost hadoop-3.3.6]# hadoop fs -ls /user/mapreduce
Found 1 items
drwxr-xr-x   - root supergroup          0 2024-03-07 16:20 /user/mapreduce/input
[root@localhost hadoop-3.3.6]# hadoop fs -ls /user/mapreduce/input
Found 1 items
-rw-r--r--   1 root supergroup      47873 2024-03-07 16:20 /user/mapreduce/input/pg73116.txt

실행

[root@localhost hadoop-3.3.6]# hadoop jar ~/hadoop-3.3.6/share/hadoop/tools/lib/hadoop-streaming-3.3.6.jar \  
-input /user/mapreduce/input \  
-output /user/mapreduce/output \  
-mapper ~/hadoop-3.3.6/mapper.py \  
-reducer ~/hadoop-3.3.6/mapreduce.py