Re: parsing the file

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: parsing the file

jholtman
Here is an attempt at parsing the data.  It is fixed field so the regular
expression will extract the data.  Some does not seem to make sense since
it has curly brackets in the data.


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sun, Aug 28, 2016 at 8:49 AM, Glenn Schultz <[hidden email]> wrote:

> Hi Jim,
>
> Attached is the layout of the file I would like to parse with dput sample
> of the data.  From the layout it seems to me there are two sets in the data
> Header and Details.  I would like to either parse such that
>
>
>    - I have either 1 comma delimited file of all data or
>    - 2 comma delimited files one of header the other of details
>
>
> I have never seen a file layout described in the manner before.
> Consequently, I am a little confused as to how to work with the file.
>
> Best,
> Glenn
>
> "1176552 CL20031031367RBV319920901
>
>
>  217655208875{08875{08875{08875{08875{08875{22D22D22D22D22D2
> 2D13C13C13C13C13C13C0000604000{0000604000{0000604000{0000604
> 000{0000604000{0000604000{36{36{36{36{36{36{08500{08500{08500{08500{08500{
> 08500{1254240 CL20031031371KLV120020201
>
>
>  225424007484{07250{07375{07500{07625{08625{33F06H33H33I34{3
> 4A02A01I02{02{02A03B0001121957C0000123500{0000920000{0001280
> 000{0001741000{0003849000{35I30{36{36{36{36{07000{07000{07000{07000{07000{
> 07000{1254253 CL20031031371KMA620020301
>
>
>  225425306715{06250{06500{06750{06875{07000{33C23G33C33I34{3
> 4A02{01I02{02{02A02C0000946646A0000350000{0000850000{0001030
> 000{0001205000{0001300000{35H30{36{36{36{36{06000{06000{06000{06000{06000{
> 06000{1259455 CL20031031371RE4420020501
>
>
>  225945507045{06750{06875{07000{07250{07375{34{28B34A34B34B3
> 4C01H01G01H01H01H02C0000934444E0000360000{0000765000{0000995
> 000{0001384000{0002184000{35I30{36{36{36{36{06500{06500{06500{06500{06500{
> 06500{1261060 CI20031031371S5V219940101
>
>
>  226106006637{06500{06500{06625{06750{06875{05B00C04H05I06B0
> 6B11H11G11G11H11H11I0001169090I0000650000{0000950000{0001250
> 000{0001328000{0001900000{18{18{18{18{18{18{06000{06000{06000{06000{06000{
> 06000{1335271 CI20031031375HMU519960101
>
>
>  233527107500{07500{07500{07500{07500{07500{08B06B08E08F08F0
> 8F09D09D09D09D09E09E0000717375{0000464000{0000550000{0000770
> 000{0001085500{0001085500{18{18{18{18{18{18{07000{07000{07000{07000{07000{
> 07000{1440840 CL20031031380HV9519981101
>
>
>  244084006707{06500{06625{06750{06875{06875{27D03C28C29H30{3
> 0A06{05I06{06{06{06A0000615172I0000250000{0000621000{
> 0000673000{0000750000{0000791000{36{36{36{36{36{36{06000{
> 06000{06000{06000{06000{06000{1521993 CI20031031384E3A620000101
>
>
>    252199306937{06875{06875{06875{07000{07000{12H02H12H13{13D1
> 3E04E04E04E04E04F04F0001129428F0000700000{0000955000{0001000
> 000{0002087000{0002087000{18{18{18{18{18{18{06500{06500{0650
> 0{06500{06500{06500{1538080 CL20031031384YXH420000501
>
>
>  253808008875{08875{08875{08875{08875{08875{31I31I31I31I31I3
> 1I04A04A04A04A04A04A0001419300{0001419300{0001419300{0001419
> 300{0001419300{0001419300{36{36{36{36{36{36{07000{07000{07000{07000{07000{
> 07000{1659123 CI20031031390XG8720020801
>
>
>  265912306909{06750{06750{06875{07000{07125{16E15I16C16E16F1
> 6F01E01D01D01E01E01G0000998541G0000162000{0000792000{0001156
> 500{0001600000{0001990000{18{18{18{18{18{18{06000{06000{06000{06000{06000{
> 06000{"
>

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

parsing.txt (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: parsing the file

Jeff Newmiller
Based on the discussion of ORing values with characters in [1] which may generate "unusual" characters I suspect a botched conversion from EBCDIC may have messed with some of the data. If there are signed data fields then OP may need to read the original file and treat it as if it were binary data and do any needed  translation themselves to retrieve it. Not a task for the faint of heart.

[1] http://www.3480-3590-data-conversion.com/article-reading-cobol-layouts-4.html
--
Sent from my phone. Please excuse my brevity.

On August 28, 2016 7:26:24 AM PDT, jim holtman <[hidden email]> wrote:

>Here is an attempt at parsing the data.  It is fixed field so the
>regular
>expression will extract the data.  Some does not seem to make sense
>since
>it has curly brackets in the data.
>
>
>Jim Holtman
>Data Munger Guru
>
>What is the problem that you are trying to solve?
>Tell me what you want to do, not how you want to do it.
>
>On Sun, Aug 28, 2016 at 8:49 AM, Glenn Schultz <[hidden email]>
>wrote:
>
>> Hi Jim,
>>
>> Attached is the layout of the file I would like to parse with dput
>sample
>> of the data.  From the layout it seems to me there are two sets in
>the data
>> Header and Details.  I would like to either parse such that
>>
>>
>>    - I have either 1 comma delimited file of all data or
>>    - 2 comma delimited files one of header the other of details
>>
>>
>> I have never seen a file layout described in the manner before.
>> Consequently, I am a little confused as to how to work with the file.
>>
>> Best,
>> Glenn
>>
>> "1176552 CL20031031367RBV319920901
>>
>>
>>  217655208875{08875{08875{08875{08875{08875{22D22D22D22D22D2
>> 2D13C13C13C13C13C13C0000604000{0000604000{0000604000{0000604
>>
>000{0000604000{0000604000{36{36{36{36{36{36{08500{08500{08500{08500{08500{
>> 08500{1254240 CL20031031371KLV120020201
>>
>>
>>  225424007484{07250{07375{07500{07625{08625{33F06H33H33I34{3
>> 4A02A01I02{02{02A03B0001121957C0000123500{0000920000{0001280
>>
>000{0001741000{0003849000{35I30{36{36{36{36{07000{07000{07000{07000{07000{
>> 07000{1254253 CL20031031371KMA620020301
>>
>>
>>  225425306715{06250{06500{06750{06875{07000{33C23G33C33I34{3
>> 4A02{01I02{02{02A02C0000946646A0000350000{0000850000{0001030
>>
>000{0001205000{0001300000{35H30{36{36{36{36{06000{06000{06000{06000{06000{
>> 06000{1259455 CL20031031371RE4420020501
>>
>>
>>  225945507045{06750{06875{07000{07250{07375{34{28B34A34B34B3
>> 4C01H01G01H01H01H02C0000934444E0000360000{0000765000{0000995
>>
>000{0001384000{0002184000{35I30{36{36{36{36{06500{06500{06500{06500{06500{
>> 06500{1261060 CI20031031371S5V219940101
>>
>>
>>  226106006637{06500{06500{06625{06750{06875{05B00C04H05I06B0
>> 6B11H11G11G11H11H11I0001169090I0000650000{0000950000{0001250
>>
>000{0001328000{0001900000{18{18{18{18{18{18{06000{06000{06000{06000{06000{
>> 06000{1335271 CI20031031375HMU519960101
>>
>>
>>  233527107500{07500{07500{07500{07500{07500{08B06B08E08F08F0
>> 8F09D09D09D09D09E09E0000717375{0000464000{0000550000{0000770
>>
>000{0001085500{0001085500{18{18{18{18{18{18{07000{07000{07000{07000{07000{
>> 07000{1440840 CL20031031380HV9519981101
>>
>>
>>  244084006707{06500{06625{06750{06875{06875{27D03C28C29H30{3
>> 0A06{05I06{06{06{06A0000615172I0000250000{0000621000{
>> 0000673000{0000750000{0000791000{36{36{36{36{36{36{06000{
>> 06000{06000{06000{06000{06000{1521993 CI20031031384E3A620000101
>>
>>
>>    252199306937{06875{06875{06875{07000{07000{12H02H12H13{13D1
>> 3E04E04E04E04E04F04F0001129428F0000700000{0000955000{0001000
>> 000{0002087000{0002087000{18{18{18{18{18{18{06500{06500{0650
>> 0{06500{06500{06500{1538080 CL20031031384YXH420000501
>>
>>
>>  253808008875{08875{08875{08875{08875{08875{31I31I31I31I31I3
>> 1I04A04A04A04A04A04A0001419300{0001419300{0001419300{0001419
>>
>300{0001419300{0001419300{36{36{36{36{36{36{07000{07000{07000{07000{07000{
>> 07000{1659123 CI20031031390XG8720020801
>>
>>
>>  265912306909{06750{06750{06875{07000{07125{16E15I16C16E16F1
>> 6F01E01D01D01E01E01G0000998541G0000162000{0000792000{0001156
>>
>500{0001600000{0001990000{18{18{18{18{18{18{06000{06000{06000{06000{06000{
>> 06000{"
>>
>
>
>------------------------------------------------------------------------
>
>______________________________________________
>[hidden email] mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

______________________________________________
[hidden email] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.