Using R
Importing data into R: Call your data something – for this example we are calling it moma. Tell R to read the file as a csv, file.choose() will open a window where you can browse to your file on your computer, and header=T means that the first row of your csv contains header info. Hit enter to import the data and then, moma enter to show it in the console window. Attach it so you can start performing operations on it.
moma <- read.csv(file.choose(), header=T)
Scatterplots should compare two numeric values *
Sometimes an object can be assigned an incorrect type. This happened with my data – YearCreated was being treated as a factor (a category) rather than a numeric type. To change this:
moma$YearCreated <- as.numeric(as.factor(moma$YearCreated))
Producing Scatterplots:
Pearson’s correlation
cor(Age, Height)
The variable that is listed first will appear on the x axis. Give the scatterplot a title using the main argument. Lable the x and y axis with xlab and ylab arguments. You can also asjust the size of the plot points with the cex argument. The pch argument changes the shape of the plotting character. Color can be changed with the col argument.
plot(YearCreated, YearAcquired, main=”MoMA Photography Collection”, xlab=”Year Created”, ylab=”Year Acquired”, cex=0.5, pch=8, col=2)
plot(YearAcquired[Gender==”Female”], YearAcquired[Gender==”Female”], col=4, main=”Female Artists”)
The points command allows us to add more points or information to the plot without overriding the plot
points(YearAcquired[Gender==”Male”], YearAcquired[Gender==”Male”])
To have these appear in separate plots on one screen.
plot(YearAcquired[Gender==”Female”], YearAcquired[Gender==”Female”], col=4, main=”Female Artists”)
plot(YearAcquired[Gender==”Male”], YearCreated[Gender==”Male”], col=4, main=”Male Artists”)
To reset to one 1 plot: par(mfrow=c(1,1))
Labeling axis (y axis is side 2, x axis is side 1)
axis(side=2, at=c(1929, 1940, 1984), labels=c(1929, 1940, 1984))

axis(2, at=seq(1929,1940,1984), seq(1929,1940,1984)
Founding, department of photography is established, expansion
Linear Regression: Here we are predicting the YearAcquired using the YearCreated. Again using the col argument and the lwd which increases the line’s width.
abline(lm(YearAcquired~YearCreated), col=2, lwd=5)
Doing some calculations on categorical variables.
Category divided by number of observations
table(Gender)/length(Gender) *this guards against typos or changes that may happen in the dataset
trimmed mean
mean(YearAcquired, trim=.10) *this removes the top and bottom 10 percent of observations – outliers
var(YearAcquired) *variance
sd(YearAcquired) *standard deviantion
min(YearAcquired) *minumum observation
max(YearAcquired) *max observation
*Pearson is the default method
summary(moma) *summary of the entire data
“Exclude missing values
We can exclude missing values in a couple different ways. First, if we want to exclude missing values from mathematical operations use the na.rm = TRUE argument. If you do not exclude these values most functions will return an NA.”
# excluding NA values will calculate the mathematical operation for all non-missing values
mean(x, na.rm = TRUE)
data3$removeCirca <- as.numeric(as.factor(data3$removeCirca))


create a subset of the data

photoexpand <-moma7[YearAcquired>1983, ]

photodept <-moma7[YearAcquired>1939, ]

check the work: photoexpand[1:10, ]

barplot(percent, main=”Photography Collection Provenance”, xlab=”Credit Line”, ylab=”%”, las=1)

for numerical values that contain non-numeric values,

mean(Width, na.rm = TRUE)

overlapping credit lines?  set your las=2

Carte-de-viste: 2 1/2 by 4 1/4

Victoria 5 by 3 1/2

Cabinet 6 1/2 by 4 1/2

Promenade 7×4

Panel 8 by 4 1/4

Boudouir 5 1/4 by 8 1/4

Imperial 9 7/8 by 7 7/8

(these are all in inches)