

Dear useRs,
Below is a sample of my dataset (I have more rows and columns).
As you can see in the 2nd column, there are values, the name of the parameter
('Sq' in that case), some integer ('45' in that case) and the unit ('µm' or
'nm').
I know how to extract the rows of interest (those with values), but they are
expressed in different units. All values following a line with the unit are
expressed in that unit, but the number of lines is not constant (sometimes each
value is expressed in a different unit so there will be a new unit line, but
there are sometimes several values in a row expressed in the same unit so
without unit lines in between). I hope this is clear (it should be with the
example provided).
This messy dataset comes from an external software so I don't have any means to
format the ways the data are collated. I have to find a way to deal with it in
R.
What I would like to do is convert the values in nm to µm; I just need to
multiply by 1000.
What I don't know is how to identify the values that are expressed in nm (all
values that follow a line with 'nm' until there is a line with 'µm').
I don't even know how I should search online because I don't know how this kind
of operation is called.
Any help is appreciated.
Thank you in advance.
Ivan
my.data < structure(list(V1 = c("2019/05/10", "#", "#", "#", "2019/05/10",
"2019/05/10", "2019/05/10", "#", "#", "#", "2019/05/10", "#", "#", "#",
"2019/05/10", "#", "#", "#", "2019/05/10", "2019/05/10"), V19 =
c("0.2012800083", "45", "Sq", "µm", "0.3634383236", "0.4360454777",
"0.3767733568", "45", "Sq", "nm", "102.013048", "45", "Sq", "µm",
"0.1413840498", "45", "Sq", "nm", "65.4459715", "46.45802917")), row.names =
c(NA, 20L), class = "data.frame")

Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772243
https://www.researchgate.net/profile/Ivan_Calandra______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


From nm to micron, _divide_ by 1000.... (as you likely know)
What are the units of the first value? Looks like micron in your example, but is there a rule?
Basically, it is a "last observation carried forward" type problem, so something like this:
my.data < structure(list(V1 = c("2019/05/10", "#", "#", "#", "2019/05/10",
"2019/05/10", "2019/05/10", "#", "#", "#", "2019/05/10", "#", "#", "#",
"2019/05/10", "#", "#", "#", "2019/05/10", "2019/05/10"), V19 =
c("0.2012800083", "45", "Sq", "µm", "0.3634383236", "0.4360454777",
"0.3767733568", "45", "Sq", "nm", "102.013048", "45", "Sq", "µm",
"0.1413840498", "45", "Sq", "nm", "65.4459715", "46.45802917")), row.names =
c(NA, 20L), class = "data.frame")
y < my.data$V19
u < ifelse(y=="nm"  y=="µm", y, NA)
num < my.data$V1 != "#"
uu < zoo::na.locf(u, na.rm=FALSE)
data.frame(val = as.numeric(y[num]), units = uu[num])
giving
val units
1 0.2012800 <NA>
2 0.3634383 µm
3 0.4360455 µm
4 0.3767734 µm
5 102.0130480 nm
6 0.1413840 µm
7 65.4459715 nm
8 46.4580292 nm
and you can surely take it from there.
pd
> On 10 May 2019, at 13:54 , Ivan Calandra < [hidden email]> wrote:
>
> Dear useRs,
>
> Below is a sample of my dataset (I have more rows and columns).
>
> As you can see in the 2nd column, there are values, the name of the parameter
> ('Sq' in that case), some integer ('45' in that case) and the unit ('µm' or
> 'nm').
> I know how to extract the rows of interest (those with values), but they are
> expressed in different units. All values following a line with the unit are
> expressed in that unit, but the number of lines is not constant (sometimes each
> value is expressed in a different unit so there will be a new unit line, but
> there are sometimes several values in a row expressed in the same unit so
> without unit lines in between). I hope this is clear (it should be with the
> example provided).
> This messy dataset comes from an external software so I don't have any means to
> format the ways the data are collated. I have to find a way to deal with it in
> R.
>
> What I would like to do is convert the values in nm to µm; I just need to
> multiply by 1000.
>
> What I don't know is how to identify the values that are expressed in nm (all
> values that follow a line with 'nm' until there is a line with 'µm').
>
> I don't even know how I should search online because I don't know how this kind
> of operation is called.
> Any help is appreciated.
>
> Thank you in advance.
> Ivan
>
>
> my.data < structure(list(V1 = c("2019/05/10", "#", "#", "#", "2019/05/10",
> "2019/05/10", "2019/05/10", "#", "#", "#", "2019/05/10", "#", "#", "#",
> "2019/05/10", "#", "#", "#", "2019/05/10", "2019/05/10"), V19 =
> c("0.2012800083", "45", "Sq", "µm", "0.3634383236", "0.4360454777",
> "0.3767733568", "45", "Sq", "nm", "102.013048", "45", "Sq", "µm",
> "0.1413840498", "45", "Sq", "nm", "65.4459715", "46.45802917")), row.names =
> c(NA, 20L), class = "data.frame")
>
> 
> Dr. Ivan Calandra
> TraCEr, laboratory for Traceology and Controlled Experiments
> MONREPOS Archaeological Research Centre and
> Museum for Human Behavioural Evolution
> Schloss Monrepos
> 56567 Neuwied, Germany
> +49 (0) 2631 9772243
> https://www.researchgate.net/profile/Ivan_Calandra>
> ______________________________________________
> [hidden email] mailing list  To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.

Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: [hidden email] Priv: [hidden email]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


This is my approach. It is based entirely on what you said (multiply by
1000 to convert from nm to µm, but I think it is divide by). It assumes
that the starting value is in µm. If the starting value is in nm, change
the "factor=1" to "factor=1000". That µm is micrometers (10^6) and nm is
nanometers (10^9), so divide by would be correct.
factor=1;
for (i in 1:length(my.data$V19)) {
print("Start");print(factor);print(my.data[i,]);
if (my.data$V19[i] == "nm") { factor=1000;
my.data$V19[i]="µm";print("nm");} else if (my.data$V19[i] == "µm")
{factor=1;};
if (suppressWarnings(!is.na(as.numeric(my.data$V19[i])))) { my.data$V19[i]
= as.character(as.numeric(my.data$V19[i]) * factor); print("changed"); }
print(factor);print(my.data[i,]);print("End");
On Fri, May 10, 2019 at 6:54 AM Ivan Calandra < [hidden email]> wrote:
> Dear useRs,
>
> Below is a sample of my dataset (I have more rows and columns).
>
> As you can see in the 2nd column, there are values, the name of the
> parameter
> ('Sq' in that case), some integer ('45' in that case) and the unit ('µm' or
> 'nm').
> I know how to extract the rows of interest (those with values), but they
> are
> expressed in different units. All values following a line with the unit are
> expressed in that unit, but the number of lines is not constant (sometimes
> each
> value is expressed in a different unit so there will be a new unit line,
> but
> there are sometimes several values in a row expressed in the same unit so
> without unit lines in between). I hope this is clear (it should be with the
> example provided).
> This messy dataset comes from an external software so I don't have any
> means to
> format the ways the data are collated. I have to find a way to deal with
> it in
> R.
>
> What I would like to do is convert the values in nm to µm; I just need to
> multiply by 1000.
>
> What I don't know is how to identify the values that are expressed in nm
> (all
> values that follow a line with 'nm' until there is a line with 'µm').
>
> I don't even know how I should search online because I don't know how this
> kind
> of operation is called.
> Any help is appreciated.
>
> Thank you in advance.
> Ivan
>
>
> my.data < structure(list(V1 = c("2019/05/10", "#", "#", "#", "2019/05/10",
> "2019/05/10", "2019/05/10", "#", "#", "#", "2019/05/10", "#", "#", "#",
> "2019/05/10", "#", "#", "#", "2019/05/10", "2019/05/10"), V19 =
> c("0.2012800083", "45", "Sq", "µm", "0.3634383236", "0.4360454777",
> "0.3767733568", "45", "Sq", "nm", "102.013048", "45", "Sq", "µm",
> "0.1413840498", "45", "Sq", "nm", "65.4459715", "46.45802917")), row.names
> =
> c(NA, 20L), class = "data.frame")
>
> 
> Dr. Ivan Calandra
> TraCEr, laboratory for Traceology and Controlled Experiments
> MONREPOS Archaeological Research Centre and
> Museum for Human Behavioural Evolution
> Schloss Monrepos
> 56567 Neuwied, Germany
> +49 (0) 2631 9772243
> https://www.researchgate.net/profile/Ivan_Calandra>
> ______________________________________________
> [hidden email] mailing list  To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/rhelp> PLEASE do read the posting guide
> http://www.Rproject.org/postingguide.html> and provide commented, minimal, selfcontained, reproducible code.
>

This is clearly another case of too many mad scientists, and not enough
hunchbacks.
Maranatha! <><
John McKown
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Dear John,
Thank you for your answer.
However, it does not make sense to me, as it works only line by line of the
data.frame, and I need something for "last observation carried forward" as Peter
mentioned. The script does not work as is either, probably due to typos with
semicolons and "if... else" statements, so I cannot really test it.
Best,
Ivan

Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772243
https://www.researchgate.net/profile/Ivan_CalandraOn May 10, 2019 at 3:47 PM John McKown < [hidden email]> wrote:
> This is my approach. It is based entirely on what you said (multiply by 1000
> to convert from nm to µm, but I think it is divide by). It assumes that the
> starting value is in µm. If the starting value is in nm, change the
> "factor=1" to "factor=1000". That µm is micrometers (10^6) and nm is
> nanometers (10^9), so divide by would be correct.
>
> factor=1;
> for (i in 1:length(my.data$V19)) {
> print("Start");print(factor);print(my.data[i,]);
> if (my.data$V19[i] == "nm") { factor=1000; my.data$V19[i]="µm";print("nm");}
> else if (my.data$V19[i] == "µm") {factor=1;};
> if (suppressWarnings(! is.na < http://is.na> (as.numeric(my.data$V19[i])))) {
> my.data$V19[i] = as.character(as.numeric(my.data$V19[i]) * factor);
> print("changed"); }
> print(factor);print(my.data[i,]);print("End");
>
>
>
> On Fri, May 10, 2019 at 6:54 AM Ivan Calandra < [hidden email]
> <mailto: [hidden email]> > wrote:
> > > Dear useRs,
> >
> > Below is a sample of my dataset (I have more rows and columns).
> >
> > As you can see in the 2nd column, there are values, the name of the
> > parameter
> > ('Sq' in that case), some integer ('45' in that case) and the unit ('µm'
> > or
> > 'nm').
> > I know how to extract the rows of interest (those with values), but they
> > are
> > expressed in different units. All values following a line with the unit
> > are
> > expressed in that unit, but the number of lines is not constant
> > (sometimes each
> > value is expressed in a different unit so there will be a new unit line,
> > but
> > there are sometimes several values in a row expressed in the same unit so
> > without unit lines in between). I hope this is clear (it should be with
> > the
> > example provided).
> > This messy dataset comes from an external software so I don't have any
> > means to
> > format the ways the data are collated. I have to find a way to deal with
> > it in
> > R.
> >
> > What I would like to do is convert the values in nm to µm; I just need to
> > multiply by 1000.
> >
> > What I don't know is how to identify the values that are expressed in nm
> > (all
> > values that follow a line with 'nm' until there is a line with 'µm').
> >
> > I don't even know how I should search online because I don't know how
> > this kind
> > of operation is called.
> > Any help is appreciated.
> >
> > Thank you in advance.
> > Ivan
> >
> >
> > my.data < structure(list(V1 = c("2019/05/10", "#", "#", "#",
> > "2019/05/10",
> > "2019/05/10", "2019/05/10", "#", "#", "#", "2019/05/10", "#", "#", "#",
> > "2019/05/10", "#", "#", "#", "2019/05/10", "2019/05/10"), V19 =
> > c("0.2012800083", "45", "Sq", "µm", "0.3634383236", "0.4360454777",
> > "0.3767733568", "45", "Sq", "nm", "102.013048", "45", "Sq", "µm",
> > "0.1413840498", "45", "Sq", "nm", "65.4459715", "46.45802917")),
> > row.names =
> > c(NA, 20L), class = "data.frame")
> >
> > 
> > Dr. Ivan Calandra
> > TraCEr, laboratory for Traceology and Controlled Experiments
> > MONREPOS Archaeological Research Centre and
> > Museum for Human Behavioural Evolution
> > Schloss Monrepos
> > 56567 Neuwied, Germany
> > +49 (0) 2631 9772243
> > https://www.researchgate.net/profile/Ivan_Calandra> > < https://www.researchgate.net/profile/Ivan_Calandra>
> >
> > ______________________________________________
> > [hidden email] <mailto: [hidden email]> mailing list  To
> > UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/rhelp> > < https://stat.ethz.ch/mailman/listinfo/rhelp>
> > PLEASE do read the posting guide
> > http://www.Rproject.org/postingguide.html> > < http://www.Rproject.org/postingguide.html>
> > and provide commented, minimal, selfcontained, reproducible code.
> > >
>
> 
> This is clearly another case of too many mad scientists, and not enough
> hunchbacks.
>
>
> Maranatha! <><
> John McKown
>
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.


Dear Peter,
Thank you for your answer, the function na.locf() is exactly what I needed!
I had started processing my dataset so the first lines (used as headers) were
not included in the sample I have sent. But there is also a "unit" line before
the first value.
And yes, of course, divide by 1000.
Best,
Ivan

Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772243
https://www.researchgate.net/profile/Ivan_CalandraOn May 10, 2019 at 3:29 PM peter dalgaard < [hidden email]> wrote:
> From nm to micron, _divide_ by 1000.... (as you likely know)
>
> What are the units of the first value? Looks like micron in your example, but
> is there a rule?
>
> Basically, it is a "last observation carried forward" type problem, so
> something like this:
>
>
> my.data < structure(list(V1 = c("2019/05/10", "#", "#", "#", "2019/05/10",
> "2019/05/10", "2019/05/10", "#", "#", "#", "2019/05/10", "#", "#", "#",
> "2019/05/10", "#", "#", "#", "2019/05/10", "2019/05/10"), V19 =
> c("0.2012800083", "45", "Sq", "µm", "0.3634383236", "0.4360454777",
> "0.3767733568", "45", "Sq", "nm", "102.013048", "45", "Sq", "µm",
> "0.1413840498", "45", "Sq", "nm", "65.4459715", "46.45802917")), row.names =
> c(NA, 20L), class = "data.frame")
>
> y < my.data$V19
> u < ifelse(y=="nm"  y=="µm", y, NA)
> num < my.data$V1 != "#"
> uu < zoo::na.locf(u, na.rm=FALSE)
> data.frame(val = as.numeric(y[num]), units = uu[num])
>
> giving
> val units
> 1 0.2012800 <NA>
> 2 0.3634383 µm
> 3 0.4360455 µm
> 4 0.3767734 µm
> 5 102.0130480 nm
> 6 0.1413840 µm
> 7 65.4459715 nm
> 8 46.4580292 nm
>
> and you can surely take it from there.
>
> pd
>
>
> > On 10 May 2019, at 13:54 , Ivan Calandra < [hidden email]> wrote:
> >
> > Dear useRs,
> >
> > Below is a sample of my dataset (I have more rows and columns).
> >
> > As you can see in the 2nd column, there are values, the name of the
> > parameter
> > ('Sq' in that case), some integer ('45' in that case) and the unit ('µm' or
> > 'nm').
> > I know how to extract the rows of interest (those with values), but they are
> > expressed in different units. All values following a line with the unit are
> > expressed in that unit, but the number of lines is not constant (sometimes
> > each
> > value is expressed in a different unit so there will be a new unit line, but
> > there are sometimes several values in a row expressed in the same unit so
> > without unit lines in between). I hope this is clear (it should be with the
> > example provided).
> > This messy dataset comes from an external software so I don't have any means
> > to
> > format the ways the data are collated. I have to find a way to deal with it
> > in
> > R.
> >
> > What I would like to do is convert the values in nm to µm; I just need to
> > multiply by 1000.
> >
> > What I don't know is how to identify the values that are expressed in nm
> > (all
> > values that follow a line with 'nm' until there is a line with 'µm').
> >
> > I don't even know how I should search online because I don't know how this
> > kind
> > of operation is called.
> > Any help is appreciated.
> >
> > Thank you in advance.
> > Ivan
> >
> >
> > my.data < structure(list(V1 = c("2019/05/10", "#", "#", "#", "2019/05/10",
> > "2019/05/10", "2019/05/10", "#", "#", "#", "2019/05/10", "#", "#", "#",
> > "2019/05/10", "#", "#", "#", "2019/05/10", "2019/05/10"), V19 =
> > c("0.2012800083", "45", "Sq", "µm", "0.3634383236", "0.4360454777",
> > "0.3767733568", "45", "Sq", "nm", "102.013048", "45", "Sq", "µm",
> > "0.1413840498", "45", "Sq", "nm", "65.4459715", "46.45802917")), row.names =
> > c(NA, 20L), class = "data.frame")
> >
> > 
> > Dr. Ivan Calandra
> > TraCEr, laboratory for Traceology and Controlled Experiments
> > MONREPOS Archaeological Research Centre and
> > Museum for Human Behavioural Evolution
> > Schloss Monrepos
> > 56567 Neuwied, Germany
> > +49 (0) 2631 9772243
> > https://www.researchgate.net/profile/Ivan_Calandra> >
> > ______________________________________________
> > [hidden email] mailing list  To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/rhelp> > PLEASE do read the posting guide http://www.Rproject.org/postingguide.html> > and provide commented, minimal, selfcontained, reproducible code.
>
> 
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Office: A 4.23
> Email: [hidden email] Priv: [hidden email]
>
>
>
>
>
>
>
>
>
[[alternative HTML version deleted]]
______________________________________________
[hidden email] mailing list  To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/rhelpPLEASE do read the posting guide http://www.Rproject.org/postingguide.htmland provide commented, minimal, selfcontained, reproducible code.

