Quantcast

row.names in dunes and dunes.env?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

row.names in dunes and dunes.env?

Chris Butler
Hello,

I've got a small dataset on box turtle shell measurements that I would like to perform a detrended correspondence analysis on. I thought that it would be interesting to examine the morphometrics for each species in the area of overlap and in areas where neither species occurs. 

I've taken a look at the dune and dune.env datasets in vegan. Using the str() command gives me 

> str(dune)
'data.frame':   20 obs. of  30 variables:
 $ Belper: num  3 0 2 0 0 0 0 2 0 0 ...
 $ Empnig: num  0 0 0 0 0 0 0 0 0 0 ...
 $ Junbuf: num  0 3 0 0 0 0 0 0 0 0 ...
 $ Junart: num  0 0 0 3 0 0 4 0 0 3 ...
 ...

However, when I try looking directly at the data frame using the edit command I see that there is a column called "row.names" to the left of "Belper".

Likewise, when I use the str() command on dune.env I get

> str(dune.env)
'data.frame':   20 obs. of  5 variables:
 $ A1        : num  3.5 6 4.2 5.7 4.3 2.8 4.2 6.3 4 11.5 ...
 $ Moisture  : Ord.factor w/ 4 levels "1"<"2"<"4"<"5": 1 4 2 4 1 1 4 1 2 4 ...
 $ Management: Factor w/ 4 levels "BF","HF","NM",..: 1 4 4 4 2 4 2 2 3 3 ...
 $ Use       : Ord.factor w/ 3 levels "Hayfield"<"Haypastu"<..: 2 2 2 3 2 2 3 1 1 2 ...
 $ Manure    : Ord.factor w/ 5 levels "0"<"1"<"2"<"3"<..: 3 4 5 4 3 5 4 3 1 1 ...

but using the edit() command shows a column named "row.names".

I assume that the the "row.names" column is used to link the two files together.

My turtle data is saved as a *.csv, and I've added a column called "row.names", so that it looks like this

row.names,CL,CCL,CW,CCW,CH,CCH
1,104.4,131.8,89.887,137.4,43.391,89.7
2,108.79,135.9,87.78,118.1,50.72,71.2
3,114.12,126.1,89.33,132.8,142.39,78.3
4,102.87,128.2,84.2,125,45.42,72.4
5,84.6,104.8,72.61,111.8,41.1,57.3

I've called this file "turtles_dca.csv". I've also created a file called "turtles_dca_env.csv" that looks like this

row.names,Species,Sex,Distribution,Concatenated,Species_overlap
1,Terrapene_ornata,Female,overlap,TO_F_Overlap,TO_Overlap
2,Terrapene_ornata,Female,overlap,TO_F_Overlap,TO_Overlap
3,Terrapene_ornata,Female,overlap,TO_F_Overlap,TO_Overlap
4,Terrapene_ornata,Female,overlap,TO_F_Overlap,TO_Overlap
5,Terrapene_ornata,Female,overlap,TO_F_Overlap,TO_Overlap

However, when I read the data into R using this command

turtles.env = read.csv("turtles_dca_env.csv", header = TRUE)


and then using the str() command I get 


> str(turtles)
'data.frame':   67 obs. of  7 variables:
 $ row.names: int  1 2 3 4 5 6 7 8 9 10 ...
 $ CL       : num  104.4 108.8 114.1 102.9 84.6 ...
 $ CCL      : num  132 136 126 128 105 ...
 $ CW       : num  89.9 87.8 89.3 84.2 72.6 ...
 $ CCW      : num  137 118 133 125 112 ...
 $ CH       : num  43.4 50.7 142.4 45.4 41.1 ...
 $ CCH      : num  89.7 71.2 78.3 72.4 57.3 73.4 67 57 68.8 68 ...

When I run decorana() on this dataset, it appears that the column "row.names" is included in the analysis, which isn't what I'm looking for. 

If I go ahead and delete the column "row.names" from my data frames (i.e. removing it from turtles and turtles.env), I don't believe that the analysis is performed correctly. The two species differ significantly in most of their measurements, but the ordihull() and ordispider() commands show them overlapping almost completely.

I think that I'm missing something pretty basic about inputting and formatting this data for this analysis. Can anyone offer a suggestion on where I'm going astray? I can send a copy of the data if anyone wants to look at it.

Best wishes,
Chris
University of Central Oklahoma
        [[alternative HTML version deleted]]


______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate
star

Re: row.names in dunes and dunes.env?

PIKAL Petr
Hi

see inline

>
> Hello,
>
> I've got a small dataset on box turtle shell measurements that I would
> like to perform a detrended correspondence analysis on. I thought that
it
> would be interesting to examine the morphometrics for each species in
the

> area of overlap and in areas where neither species occurs.
>
> I've taken a look at the dune and dune.env datasets in vegan. Using the
> str() command gives me
>
> > str(dune)
> 'data.frame':   20 obs. of  30 variables:
>  $ Belper: num  3 0 2 0 0 0 0 2 0 0 ...
>  $ Empnig: num  0 0 0 0 0 0 0 0 0 0 ...
>  $ Junbuf: num  0 3 0 0 0 0 0 0 0 0 ...
>  $ Junart: num  0 0 0 3 0 0 4 0 0 3 ...
>  ...
>
> However, when I try looking directly at the data frame using the edit
> command I see that there is a column called "row.names" to the left of
"Belper".
>
> Likewise, when I use the str() command on dune.env I get
>
> > str(dune.env)
> 'data.frame':   20 obs. of  5 variables:
>  $ A1        : num  3.5 6 4.2 5.7 4.3 2.8 4.2 6.3 4 11.5 ...
>  $ Moisture  : Ord.factor w/ 4 levels "1"<"2"<"4"<"5": 1 4 2 4 1 1 4 1 2
4 ...
>  $ Management: Factor w/ 4 levels "BF","HF","NM",..: 1 4 4 4 2 4 2 2 3 3
...
>  $ Use       : Ord.factor w/ 3 levels "Hayfield"<"Haypastu"<..: 2 2 2 3
2
> 2 3 1 1 2 ...
>  $ Manure    : Ord.factor w/ 5 levels "0"<"1"<"2"<"3"<..: 3 4 5 4 3 5 4
3 1 1 ...
>
> but using the edit() command shows a column named "row.names".

No. This is not a column but it is what it says row.names

> str(rosin)
'data.frame':   10 obs. of  5 variables:
 $ pytel: int  1 2 3 4 5 6 7 8 9 10
 $ rstr : num  1.022 0.981 0.992 1.01 0.976 ...
 $ gama : num  1.4 1.44 1.41 1.43 1.39 ...
 $ cas  : int  0 3 6 9 12 15 18 21 24 27
 $ typ  : chr  "anatas" "anatas" "anatas" "anatas"


> head(rosin)
  pytel      rstr     gama cas    typ
1     1 1.0216621 1.397885   0 anatas
2     2 0.9809663 1.442439   3 anatas
3     3 0.9916211 1.411767   6 anatas
^^ these are row names

>
> I assume that the the "row.names" column is used to link the two files
together.

If you are in doubt, recommended way is to consult documentation.

?row.names
All data frames have a row names attribute, a character vector of length
the number of rows with no duplicates nor missing values.

>
> My turtle data is saved as a *.csv, and I've added a column called
> "row.names", so that it looks like this
>
> row.names,CL,CCL,CW,CCW,CH,CCH
> 1,104.4,131.8,89.887,137.4,43.391,89.7
> 2,108.79,135.9,87.78,118.1,50.72,71.2
> 3,114.12,126.1,89.33,132.8,142.39,78.3
> 4,102.87,128.2,84.2,125,45.42,72.4
> 5,84.6,104.8,72.61,111.8,41.1,57.3
>
> I've called this file "turtles_dca.csv". I've also created a file called

> "turtles_dca_env.csv" that looks like this
>
> row.names,Species,Sex,Distribution,Concatenated,Species_overlap
> 1,Terrapene_ornata,Female,overlap,TO_F_Overlap,TO_Overlap
> 2,Terrapene_ornata,Female,overlap,TO_F_Overlap,TO_Overlap
> 3,Terrapene_ornata,Female,overlap,TO_F_Overlap,TO_Overlap
> 4,Terrapene_ornata,Female,overlap,TO_F_Overlap,TO_Overlap
> 5,Terrapene_ornata,Female,overlap,TO_F_Overlap,TO_Overlap
>
> However, when I read the data into R using this command
>
> turtles.env = read.csv("turtles_dca_env.csv", header = TRUE)
>
>
> and then using the str() command I get
>
>
> > str(turtles)
> 'data.frame':   67 obs. of  7 variables:
>  $ row.names: int  1 2 3 4 5 6 7 8 9 10 ...
>  $ CL       : num  104.4 108.8 114.1 102.9 84.6 ...
>  $ CCL      : num  132 136 126 128 105 ...
>  $ CW       : num  89.9 87.8 89.3 84.2 72.6 ...
>  $ CCW      : num  137 118 133 125 112 ...
>  $ CH       : num  43.4 50.7 142.4 45.4 41.1 ...
>  $ CCH      : num  89.7 71.2 78.3 72.4 57.3 73.4 67 57 68.8 68 ...
>
> When I run decorana() on this dataset, it appears that the column
> "row.names" is included in the analysis, which isn't what I'm looking
for.

Then why you added this column to your data?

>
> If I go ahead and delete the column "row.names" from my data frames
(i.e.
> removing it from turtles and turtles.env), I don't believe that the
> analysis is performed correctly. The two species differ significantly in

> most of their measurements, but the ordihull() and ordispider() commands

> show them overlapping almost completely.
>
> I think that I'm missing something pretty basic about inputting and
> formatting this data for this analysis. Can anyone offer a suggestion on

> where I'm going astray? I can send a copy of the data if anyone wants to
look at it.

I am not familiar with functions you use. However you probably want to
link those 2 files together. If they both are in the same order you can
just do

turtles.complet <- cbind(turtles, turtles.env)

Or if they are in different order you need to find some common column(s)
and

?merge

those two files.

Regards
Petr


>
> Best wishes,
> Chris
> University of Central Oklahoma
>    [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Loading...