Using R

Importing data into R: Call your data something – for this example we are calling it moma. Tell R to read the file as a csv, file.choose() will open a window where you can browse to your file on your computer, and header=T means that the first row of your csv contains header info. Hit enter to import the data and then, moma enter to show it in the console window. Attach it so you can start performing operations on it.

moma <- read.csv(file.choose(), header=T)

moma

attach(moma)

Scatterplots should compare two numeric values *

Sometimes an object can be assigned an incorrect type. This happened with my data – YearCreated was being treated as a factor (a category) rather than a numeric type. To change this:

moma$YearCreated <- as.numeric(as.factor(moma$YearCreated))

Producing Scatterplots:

Pearson’s correlation

cor(Age, Height)

The variable that is listed first will appear on the x axis. Give the scatterplot a title using the main argument. Lable the x and y axis with xlab and ylab arguments. You can also asjust the size of the plot points with the cex argument. The pch argument changes the shape of the plotting character. Color can be changed with the col argument.

plot(YearCreated, YearAcquired, main=”MoMA Photography Collection”, xlab=”Year Created”, ylab=”Year Acquired”, cex=0.5, pch=8, col=2)

plot(YearAcquired[Gender==”Female”], YearAcquired[Gender==”Female”], col=4, main=”Female Artists”)

The points command allows us to add more points or information to the plot without overriding the plot

points(YearAcquired[Gender==”Male”], YearAcquired[Gender==”Male”])

To have these appear in separate plots on one screen.

par(mfrow=c(1,2))

plot(YearAcquired[Gender==”Female”], YearAcquired[Gender==”Female”], col=4, main=”Female Artists”)

plot(YearAcquired[Gender==”Male”], YearCreated[Gender==”Male”], col=4, main=”Male Artists”)

To reset to one 1 plot: par(mfrow=c(1,1))

Labeling axis (y axis is side 2, x axis is side 1)

axis(side=2, at=c(1929, 1940, 1984), labels=c(1929, 1940, 1984))

axis(2, at=seq(1929,1940,1984), seq(1929,1940,1984)

Founding, department of photography is established, expansion

Linear Regression: Here we are predicting the YearAcquired using the YearCreated. Again using the col argument and the lwd which increases the line’s width.

abline(lm(YearAcquired~YearCreated), col=2, lwd=5)

plot(YearCreated

Doing some calculations on categorical variables.

table(Gender)

Category divided by number of observations

table(Gender)/30907

or

table(Gender)/length(Gender) *this guards against typos or changes that may happen in the dataset

Numeric

mean(YearAcquired)

trimmed mean

mean(YearAcquired, trim=.10) *this removes the top and bottom 10 percent of observations – outliers

median(YearAcquired)

var(YearAcquired) *variance

sd(YearAcquired) *standard deviantion

min(YearAcquired) *minumum observation

max(YearAcquired) *max observation

range(YearAcquired)

*Pearson is the default method

method=”spearman”

summary(YearAcquired)

summary(moma) *summary of the entire data

“Exclude missing values

We can exclude missing values in a couple different ways. First, if we want to exclude missing values from mathematical operations use the na.rm = TRUE argument. If you do not exclude these values most functions will return an NA.”

# excluding NA values will calculate the mathematical operation for all non-missing values

mean(x, na.rm = TRUE)

rm(data1)

attach(data3)

levels(Simple.Credit)

data3$removeCirca <- as.numeric(as.factor(data3$removeCirca))

http://www.sthda.com/english/wiki/ggplot2-point-shapes

create a subset of the data

photoexpand <-moma7[YearAcquired>1983, ]

photodept <-moma7[YearAcquired>1939, ]

check the work: photoexpand[1:10, ]

barplot(percent, main=”Photography Collection Provenance”, xlab=”Credit Line”, ylab=”%”, las=1)

for numerical values that contain non-numeric values,

mean(Width, na.rm = TRUE)

overlapping credit lines? set your las=2

Carte-de-viste: 2 1/2 by 4 1/4

Victoria 5 by 3 1/2

Cabinet 6 1/2 by 4 1/2

Promenade 7×4

Panel 8 by 4 1/4

Boudouir 5 1/4 by 8 1/4

Imperial 9 7/8 by 7 7/8

(these are all in inches)