Hadoop file IO

시행착오

Hadoop file IO

조규현15 2015. 1. 28. 11:01

기존에 완전분산이 아닌 pseudo Mode에서는 문제 없이 진행되던 부분

hdfs file Write은 아래와 같았다.

기존의 file A로 부터 line단위로 읽어온다.

List<String> 객체에 add메소드로 모두 저장한다.

다시 file A를 writer 객체로 만든다.

file A에 List<String> 객체를 line단위로 작성한다. (마지막은 '\n' 추가)

새로 작성할 string을 작성한다.

A.txt

추가할 text

결과 A.txt

하지만 이러한 job이 분산 모드로 들어가면

동시에 여러개의 task가 A.txt에 접근하므로 error가 발생하여

A,txt 구조가 깨진다.

방법은

추가할 text을 개별의 file로 작성한다(timestamp 사용)

그 다음

package folkstalk;

import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.FileUtil;
import org.apache.hadoop.fs.Path;

public class ListDirectoryContents {
  public static void main(String[] args) throws IOException, URISyntaxException
  {
    //1. Get the Configuration instance
    Configuration configuration = new Configuration();
    //2. Get the instance of the HDFS
    FileSystem hdfs = FileSystem.get(new URI("hdfs://localhost:54310"), configuration);
    //3. Get the metadata of the desired directory
    FileStatus[] fileStatus = hdfs.listStatus(new Path("hdfs://localhost:54310/user/hadoop"));
    //4. Using FileUtil, getting the Paths for all the FileStatus
    Path[] paths = FileUtil.stat2Paths(fileStatus);
    //5. Iterate through the directory and display the files in it
    System.out.println("***** Contents of the Directory *****");
    for(Path path : paths)
    {
      System.out.println(path);
    }
  }
}

hdfs의 directory를 읽어오고 A directory 하위 file의 dump를 만든다.

A/dump의 내용은 아래와 같다.

결론

> hdfs에서 하나의 file에 대한 writer를 조심하자.

저작자표시

'시행착오' 카테고리의 다른 글

[ openGL ES & ius2DEngine] 최적화 (0)	2015.10.04
EMS 수정 (0)	2015.04.03
아파치, 톰캣 연동 한글 깨짐 해결 (0)	2015.01.16
canvas 삽질 정리 (0)	2015.01.15
블로그에 코드 넣기 (0)	2015.01.08

현재글Hadoop file IO

게임, 서버

DEVDAY2013, Where's Waldorf?, PASS486, DESIGNSCHOOL, TILING2, NQEEN, WORDLENGTH, minkowskisum, 회전 초밥(고등), EDIAN, Bee Maja, minkowskiaddition, acmicpc, Algospot, Contest Scoreboard, SCPC, KBODRAFT, GRIDISLANDS, ZEROONE, BADUK2,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

keicoon15

Hadoop file IO

'시행착오' 카테고리의 다른 글

'시행착오'의 다른글

티스토리툴바

Hadoop file IO

'시행착오' 카테고리의 다른 글

'시행착오'의 다른글

관련글

티스토리툴바