DataBaser.Net: Vector

1 소수점 자리수 지정
2 벡터의 원소 select
3 Vectors and assignment
4 Vector arithmetic
5 Generating regular sequences
6 Logical vector
7 Missing values
8 Character vector
9 Index vector; selecting and modifying subsets of a data set
10 Other types of objects

1 소수점 자리수 지정 #

> options(digits=3)
> pi
[1] 3.14
> options(digits=4)
> pi
[1] 3.142
>

[edit]

2 벡터의 원소 select #

x <- seq(1:10)

x[-1] #첫 번째 원소 무시
x[-(3:5)] # 3~5 번째 원소 무시
x[c(1,3,5)] #1,3,5번째 원소만 select
x < 5 #TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
x[x < 5] #5보다 작은 원소들 select
x[x%%2 = 1] #홀수만
x[!is.na(x) & !is.null(x)] #na나 null이 아닌 원소들만..

[edit]

3 Vectors and assignment #

변수에 벡터값 1, 2, 3을 할당하는 방법은 다음과 같다.

> #방법1
> x <- c(1,2,3)
> x
[1] 1 2 3
>
> #방법2
> assign("y", c(1,2,3))
> y
[1] 1 2 3
>
> #방법3
> c(1,2,3) -> z
> z
[1] 1 2 3
> 
> #벡터에 대한 간단 계산 방법
> 1/z
[1] 1.0000000 0.5000000 0.3333333
>
> #값이 할당된 변수를 이용하여 변수에 입력
> a <- c(z, 0, z)
> z
[1] 1 2 3
> #방법4 scan()함수 이용
> x <- scan()
1: 1 2 3 4 5
6: 6 7 8 9 NA
11: 
Read 10 items
> x
 [1]  1  2  3  4  5  6  7  8  9 NA
> x <- scan()
1: 이 재 학
이하에 에러scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
  scan 함수는 'a real'를 예측하고 있는데, 얻을수 있는것은 '이'였습니다
> scan(what = "")
1: 이 재 학
4: 
Read 3 items
[1] "이" "재" "학"
> scan(what = complex(1)) #복소수
1: 1 2 3
4: 
Read 3 items
[1] 1+0i 2+0i 3+0i
> x <- scan(what = numeric())
1: 1 2 3
4: 
Read 3 items
> x
[1] 1 2 3
> #c:\r_work.txt 파일이 있다는 가정
> x <- scan(file = "c:\\r_work.txt")
Read 6 items
> x
[1] 1 2 3 4 5 6
> x <- scan(file = "c:\\r_work.txt", what = character(0)) #what 지정
Read 6 items
> x
[1] "1" "2" "3" "4" "5" "6"
>

Upload new Attachment "r_work.txt" on the "UploadFile"

[edit]

4 Vector arithmetic #

두 벡터의 길이가 다르면 짧은 쪽을 처음부터 다시 적용[1]한다. 아래의 경우 2*x + y + 1
, x(1,2,3), y(1,2,3,0,1,2,3) 일 때,

2 * x => 2, 4, 6
2*x + y => 2+1, 4+2, 6+3, 2+0, 4+1, 6+2, 2+3 => 3, 6, 9, 2, 5, 8, 5
2*x + y + 1 => 3+1, 6+1, 9+1, 2+1, 5+1, 8+1, 5+1 => 4, 7, 10, 3, 6, 9, 6

이 된다. (아..머리 쥐나 ㅡㅡ;;)

> #두 벡터의 길이가 다르면 짧은 쪽을 처음부터 다시 적용한다.
> x <- c(1,2,3)
> y <- c(x,0,x)
> v <- 2*x + y + 1
Warning message:
In 2 * x + y :
  longer object length is not a multiple of shorter object length
> v
[1]  4  7 10  3  6  9  6
>

다음과 같이 함수 연산도 할 수 있다. sum()은 합이다. x는 {1,2,3}이므로 sum(x)의 리턴값은 1+2+3의 결과인 6이다. length()는 count다. x는 {1,2,3}이므로 length(x)의 리턴값은 3이다. mean()은 평균이다. 그러므로 x는 [1,2,3}이므로 mean(x)의 리턴값은 2이다.

> sum((x-mean(x))^2)/(length(x)-1)
[1] 1
> sum(x)
[1] 6
> length(x)
[1] 3
> mean(x)
[1] 2
>

또 다른 예제로 1² + 2² + ... + 10² 는 다음과 같이 계산할 수 있다.

> sum(1:10^2)
[1] 5050
>

복소수의 경우는 다음과 같이 처리 할 수 없다.

> sqrt(-17)
[1] NaN
Warning message:
In sqrt(-17) : NANs가 작성되었습니다

그러므로 다음과 같이 처리한다.

> sqrt(-17+0i)
[1] 0+4.123106i

[edit]

5 Generating regular sequences #

Sequence를 만드는 방법은 다음과 같다.

> #방법1: 초단순 Seqeunce
> 1:10 -> seq10
> seq10
 [1]  1  2  3  4  5  6  7  8  9 10
>
> #방법2: seed 주기
> seq(1, 10, by=2) ->seq10
> seq10
[1] 1 3 5 7 9
>
> #방법3: 이렇게 해도 된다.
> seq10 <- seq(length=5, from=1, by=2)
> seq10
[1] 1 3 5 7 9
>
> #방법4: Seqeunce의 반복
> x <- 1:3
> s <- rep(x, times=5)
> s
 [1] 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
> seq(c(1,3,2))
[1] 1 2 3
> sequence(c(1,3,2)) #{1}, {1,2,3}, {1,2}
[1] 1 1 2 3 1 2
>
> #방법5: 각각을 반복할 수도 있다.
> s <- rep(x, each=5)
> s
 [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
> rep(3,3)
[1] 3 3 3
>

[edit]

6 Logical vector #

Logical vectors are generated by conditions.

> x <- c(1,2,3,4,5)
> x > 3
[1] FALSE FALSE FALSE  TRUE  TRUE
>
> x <- c(1,1,0)
> y <- c(0,1,1)
> x & y
[1] FALSE  TRUE FALSE
> x | y
[1] TRUE TRUE TRUE
>

[edit]

7 Missing values #

"Not available"이란 뜻으로 'NA'라는 키워들를 사용한다. is.na(x) 함수는 x값들이 NA이면 TRUE 아니면 FALSE를 리턴하는 함수다. 세미콜론(;)으로 두 문장을 연결해서 사용할 수 있다. 즉, 세미콜론(;)은 문장의 끝을 나타낸다. (C언와 같은 뜻이라고 보면 된다.) 문장의 처리 순서는 왼쪽에서 오른쪽이다. 'NaN'은 "Not a Number"의 약자로 'NA'와 마찬가지로 키워드다.

> z <- c(1:3, NA)
> ind <- is.na(z)
> z;ind
[1]  1  2  3 NA
[1] FALSE FALSE FALSE  TRUE
> z <- c(1:3, NA); ind <- is.na(z)
> z;ind
[1]  1  2  3 NA
[1] FALSE FALSE FALSE  TRUE
>
> Inf - Inf
[1] NaN
> 0/0
[1] NaN

[edit]

8 Character vector #

Character vector는 쌍따옴표(") 또는 작은따옴표(')를 사용하여 나타낸다. 하지만 작은 따옴표는 쌍따옴표로 프린트된다. 예를 들어, "x-values", 'New iteration results"는 Character vector다.

> name <- c("lee", "jae", "hak");
> name
[1] "lee" "jae" "hak"
> name <- c('lee', 'jae', 'hak');
> name
[1] "lee" "jae" "hak"
>

Escape Character문자는 C스타일이다. '\'문자를 사용한다. Quotes 를 치면 도움말에 다음과 같은 내용을 볼 수 있다. 참고로 쌍따옴표(")는 \"로 표시할 수 있다. 역슬래쉬(\)의 경우는 \\로 표시하는데 \\로 프린트 된다.

\n newline 
\r carriage return 
\t tab 
\b backspace 
\a alert (bell) 
\f form feed 
\v vertical tab 
\\ backslash \ 
\nnn character with given octal code (1, 2 or 3 digits) 
\xnn character with given hex code (1 or 2 hex digits) 
\unnnn Unicode character with given code (1–4 hex digits) 
\Unnnnnnnn Unicode character with given code (1–8 hex digits)

문자열 연산은 paste()함수를 사용하면 된다.

> x <- c("x", "y")
> paste(x, 1:10, sep="")
 [1] "x1"  "y2"  "x3"  "y4"  "x5"  "y6"  "x7"  "y8"  "x9"  "y10"
> 
> paste("proc", "ess", sep="/")
[1] "proc/ess"
> paste("proc", "ess", sep="")
[1] "process"
>

[edit]

9 Index vector; selecting and modifying subsets of a data set #

Index vector는 다음과 같이 4가지로 나뉜다.

A logical vector
A vector of positive integral quantities
A vector of negative integral quantities
A vector of character strings

A logical vector

> x <- c(1,2,NA)
> y <- x[!is.na(x)] #NA가 아닌 것들을 y에 할당해라.
> y
[1] 1 2
> x <- c(0,1,NA)
> x[1]
[1] 0
> x[2]
[1] 1
> x[3]
[1] NA
> (x+1)[(!is.na(x)) & x>0] -> z #NA가 아니고 0보다 큰 것들에 대해서 더하기 1을 해라.
> z
[1] 2
>

A vector of positive integral quantities

> x <- rep(c(1,2,2,1), times=4) #1,2,2,1을 4번 반복하는 vector를 x에 할당해라.
> x
[1] 1 2 2 1 1 2 2 1 1 2 2 1 1 2 2 1
> c("x", "y")[x]
 [1] "x" "y" "y" "x" "x" "y" "y" "x" "x" "y" "y" "x" "x" "y" "y" "x"
> 
> y <- c("x", "y")
> x <- c(1, 2, 3)
> y[x] #y에서 1번재, 2번째, 3번째 값을 출력하라.
[1] "x" "y" NA 
> y <- c("x", "y", "z")
> y[x]
[1] "x" "y" "z"
> y[x & (x > 1)]
[1] "y" "z"
>

A vector of negative integral quantities

> x <- c(1:10)
> x[1]
[1] 1
> x[9]
[1] 9
> x[-(1:5)]
[1]  6  7  8  9 10
>

A vector of character strings

> fruit <- c(5, 10, 1, 20)
> names(fruit) <- c("orange", "banana", "apple", "peach")
> names
function (x)  .Primitive("names")
> names(fruit)
[1] "orange" "banana" "apple"  "peach" 
> lunch <- fruit[c("apple", "orange")]
> lunch
 apple orange 
     1      5 
> #열거형으로 보면 되나? numeric indices를 기억하라네.. 
> #"orange" "banana" "apple"  "peach" 는 fruit의 값들은 {5, 10, 1, 20}를 식별할 수 있는 이름이다. names attribute..

이따위 것들도 된다.

> x <- c(-2, -1, 0, 1, 2, NA)
> x[is.na(x)] <- 0
> x
[1] -2 -1  0  1  2  0
> x[6] <- NA
> x
[1] -2 -1  0  1  2 NA
> x[x < 0] <- -x[x < 0]
이하에 에러x[x < 0] <- -x[x < 0] : 
  첨자 첨부의 부르는 값인 NAs는 허용하지 않습니다
> x[6] <- 0
> -x[x < 0]
[1] 2 1
> x[x < 0] <- -x[x < 0]
> x
[1] 2 1 0 1 2 0
> paste("Page", 1:10)
 [1] "Page 1"  "Page 2"  "Page 3"  "Page 4"  "Page 5"  "Page 6"  "Page 7"  "Page 8"  "Page 9"  "Page 10"
>

[edit]

10 Other types of objects #

matrices or more generally arrays are multi-dimesional generalizations of vectors.
factors provide compact ways to handle categorical data
lists are a general form of vector in which the various elements need not be of the same type, and are often themselves vectors or list
data frames are matrix-like structures, in which the columns can be of different types. Think of data frame as 'data matrices' with on row per observational unit but with (possibly) both numerical an categorical variables.
functions are themselves objects in R which can be stored in he project's workspace.

----

[1] R의 Recycle Rule 이라고 책에 쓰여있다.

Contents

1 소수점 자리수 지정 #

2 벡터의 원소 select #

3 Vectors and assignment #

4 Vector arithmetic #

5 Generating regular sequences #

6 Logical vector #

7 Missing values #

8 Character vector #

9 Index vector; selecting and modifying subsets of a data set #

10 Other types of objects #