How to generate an array with a counter in Perl?

Go To StackoverFlow.com

0

I want to generate a list of unique IDs. Because some of the IDs are duplicates, I need to add a number at the end to make them unique, like so:

ID=exon00001
ID=exon00002
ID=exon00003
ID=exon00004

Here's what I have so far.

 while (loop through the IDs) {
 # if $id is an exon, then increment the counter by one and add it 
 # to the end of the ID
    if ($id =~ m/exon/) {
    my $exon_count = 0;
    my @exon = $exon_count++; #3
    $number = pop @exon; # removes the first element of the list
    $id = $id.$number;
    print $id."/n"
    }
    }

Basically I want to dynamically generate an array with a counter. It's supposed to create an array (1, 2, 3, 4, ... ) for the total number of exons, then remove the elements and add it to the string. This code doesn't work properly. I think there's something wrong with line #3. Do you guys know? Any ideas? thank you

2012-04-04 23:03
by user1313954
my $exoncount is inside the loop and being set to zero each time. Move the declaration to be before the loop. Then it will increment through the loop. Also, I would just use $exoncount directly instead of doing all of the work to assign it to an array and then place it into number, or just use number and increment it instead - Glenn 2012-04-04 23:10
Your code is riddled with errors and even if it would compile, it would not do what you think. For example: the $exon_count is reset each time a new exon is found, you assign a single value (always 0, because ++ is evaluated afterwards) to an array, pop removes the last element of an array, and "/n" will print a slash and n, if you want newline, you'd need "\n" - TLP 2012-04-04 23:19
To add to what these guys said, shift removes the first element from the list, pop removes the last--however it does remove the "top" element of a stack, but that's a stack, not a list - Axeman 2012-04-05 03:05


1

Is this what you need? The counter needs to retain its value, so you can't keep resetting it as you are:

use v5.10;

my $exon_count = 0;
while( my $id = <DATA> ) {
    chomp $id;
    if( $id =~ m/exon/ ) {
        $id = sprintf "%s.%03d", $id, $exon_count++;
        }
    say $id;
    }

__END__
ID=exon00001
ID=exon00002
ID=exon00003
ID=exon00004

The output looks like:

ID=exon00001.000
ID=exon00002.001
ID=exon00003.002
ID=exon00004.003

If you're on 5.10 or later, you can use state to declare the variable inside the loop but let it keep its value:

use v5.10;

while( my $id = <DATA> ) {
    chomp $id;
    state $exon_count = 0;
    if( $id =~ m/exon/ ) {
        $id = sprintf "%s.%03d", $id, $exon_count++;
        }
    say $id;
    }

I figure you are new to Perl since your code looks like a mishmash of unrelated things that probably do something much different than you think they do. There's a Perl tutorial for biologists, "Unix and Perl". There's also my Learning Perl book.

Joel asked about using a string as the additional tag. That's fine; Perl lets you increment a string, but only on the ranges a-z and A-Z. We can mix numbers and letters by having a numeric tag that we present in base 36:

use v5.10;

use Math::Base36 'encode_base36';

while( my $id = <DATA> ) {
    chomp $id;
    state $exon_count = 30;
    if( $id =~ m/exon/ ) {
        $id = sprintf "%s.%-5s", $id, encode_base36($exon_count++);
        }
    say $id;
    }

Now you have tags like this:

ID=exon00003.1Q   
ID=exon00004.1R   
ID=exon00001.1S   
ID=exon00002.1T   
ID=exon00003.1U   
ID=exon00004.1V   
2012-04-04 23:30
by brian d foy
+1 for state, one of my favorite new Perlism - Joel Berger 2012-04-05 04:02
any thoughts on using the magical $string++? All one needs is a unique identifier. Not really needed, but kinda fun - Joel Berger 2012-04-05 04:04
@JoelBerger You can also use the range operator to do this, e.g. for ('exon1' .. 'exon9') { print }. I would not recommend it for new users, as it also increments letters, e.g. exon9 becomes exoo0, not exon10 as one might think - TLP 2012-04-05 10:50


1

As noted in my comment, your code does not compile, and does not work. Start by counting the duplicates, then print the correct count of duplicates based on the ids found. Using printf will be suitable for formatting your number.

my %seen;
my @ids = ( bunch of ids );

map $seen{$_}++, @ids;  # count the duplicates

for my $id (keys %seen) {
    for my $num (1 .. $seen{$id}) {
        printf "%s%05d\n", $id, $num;
    }
}
2012-04-04 23:26
by TLP


0

You want to generate a list of unique ids for these exons (to output into a GFF file?).

You have to be sure to initialize the counter outside of the loop. I'm not sure what you wanted to accomplish with the array. However, the program below will generate unique exon ids according to the format you posted (exon00001, etc).

my $exon_count=0;

while(my $id=<SOMEINPUT>){
      if($id=~m/exon/){
            $exon_count++;
        my $num='0' x (5 - length $exon_count) . $exon_count;
            print "$id$num\n";
      }
}
2012-04-04 23:35
by benjamin
That's a lot of work to pad a number. sprintf does that for you. : - brian d foy 2012-04-04 23:38
Ads