Excalibur's Sheath

Perl Pig Latin Conversion

Oct 27, 2016 • scripting,perl

Over the last few weeks I’ve been working on a Perl English to Pig Latin script. For testing I used the text of The Time Machine by: H.G. Wells from Project Guttenberg. You can get the full code, and the text I used for testing at my GitHub Repo. The idea of this program, is that it can take an English Language text file, and change it into Pig Latin.

The first paragraph of The Time Machine follows:

The Time Traveller (for so it will be convenient to speak of him) was expounding a recondite matter to us. His grey eyes shone and twinkled, and his usually pale face was flushed and animated. The fire burned brightly, and the soft radiance of the incandescent lights in the lilies of silver caught the bubbles that flashed and passed in our glasses. Our chairs, being his patents, embraced and caressed us rather than submitted to be sat upon, and there was that luxurious after-dinner atmosphere when thought roams gracefully free of the trammels of precision. And he put it to us in this way–marking the points with a lean forefinger–as we sat and lazily admired his earnestness over this new paradox (as we thought it) and his fecundity.

becomes:

Ethay Imetay Avellertray (orfay osay itway illway ebay onvenientcay otay eakspay ofway imhay) asway expoundingway away econditeray attermay otay usway. Ishay eygray eyesway oneshay andway inkledtway, andway ishay usuallyway alepay acefay asway ushedflay andway animatedway. Ethay irefay urnedbay ightlybray, andway ethay oftsay adianceray ofway ethay incandescentway ightslay inway ethay ilieslay ofway ilversay aughtcay ethay ubblesbay atthay ashedflay andway assedpay inway ourway assesglay. Ourway airschay, eingbay ishay atentspay, embracedway andway aressedcay usway atherray anthay ubmittedsay otay ebay atsay uponway, andway erethay asway atthay uxuriouslay afterdinnerway- atmosphereway enwhay oughtthay oamsray acefullygray eefray ofway ethay ammelstray ofway ecisionpray. Andway ehay utpay itway otay usway inway isthay aymarkingway– ethay ointspay ithway away eanlay orefingerasfay– eway atsay andway azilylay admiredway ishay earnestnessway overway isthay ewnay aradoxpay (asway eway oughtthay itway) andway ishay ecundityfay.

The rules I used for the program are simple.

  1. Strip leading consonants to the first vowel.
  2. Attach stripped consonants to the end of the word.
  3. Add ay to the end of the word.
  4. If the word starts with a vowel then add way to the end of it.

The results of my program work very well with English text, but mangle non-English text, like URLs.

The Code

openFile takes in a filename from the commandline, and attempts to open it.

sub openFile
{
        print "What file do you wish to open?\n##";
        chomp (my $fileToOpen = <STDIN>);
        #Opens the dictionary file
        open FILE, $fileToOpen or die $!;
        #Saves the file to an array
        my @source = <FILE>;
        #Returns a pointer to the array
        return \@source;
}

saveFile takes in a filename, checks to see if the file exists, exits if the file exists, or writes the converted text to the file.

sub saveFile
{
        print "What file do you wish to save the converted files to?\n##";
        chomp (my $fileToSave = <STDIN>);
        my $fileOut = shift;
        my @fileOut = @$fileOut;
        if ( -e $fileToSave)
        {
                die "File Exists!\n"
        }
        open (FILE, ">> $fileToSave");
        foreach (@fileOut)
        {
                print FILE "$_\n";
        }
        close (FILE);
}

alterVowelStarts adds “way” to the end of words which begin with a,e,i,o, or u.

sub alterVowelStarts
{
        my $word = shift;
        
        $word = $word."way";
        return $word;
}

findFirstVowel finds the location in a word of the first vowel.

sub findFirstVowel
{
        my $wordIn = shift;
        my @word = split ('', $wordIn);
        my $firstVowelCounter = 0;
        foreach (@word)
        {
                if ($_ =~ m/[^aeiou]/)
                {
                        $firstVowelCounter++;
                }
                else
                {
                        last;
                }
        }
        return $firstVowelCounter;
}

alterConsStarts alters words which begin with a consonent, by moving the starting consonants to the end of the word and adding “ay”.

sub alterConsStarts
{
        my $wordIn = shift;
        my $wordOut = "";
        my $shuffle = "";
        my @word = split ('',$wordIn);
        my $firstVowel = findFirstVowel($wordIn) - 1;
        for (my $i = 0; $i <= $firstVowel; $i++)
        {
                $shuffle = $shuffle.$word[$i];
                $word[$i] = '';
        }
        $wordIn = join ('', @word);
        $wordOut = $wordIn.$shuffle."ay";
        return $wordOut;
}

removeFirstPunct strips any leading punctuation, and returns a reference to an array containing the remaining word, and the punctuation.

sub removeFirstPunct
{
        my $wordIn = shift;
        my $whichFlag = shift;
        my $wordOut = "";
        my $chars = "";
        my @word = split ('', $wordIn);
        my $wordCounter = 0;
        my @returns = ();
                for (@word)
                {
                        if ($_ =~ m/^[^[:alnum:]]/)
                        {
                                $chars = $chars.$word[$wordCounter];
                                $word[$wordCounter] = '';
                        }
                        else
                        {
                                last;
                        }
                        $wordCounter++;
                }
        $wordOut = join('',@word);
        push (@returns, $chars);
        push (@returns, $wordOut);
        return \@returns;
}

removeLastPunct removes any punctuation at the end of the word.

sub removeLastPunct
{
        my $wordIn = shift;
        my $whichFlag = shift;
        my $wordOut = "";
        my $chars = "";
        my @word = split ('', $wordIn);
        my $wordCounter = 0;
        my @returns = ();
                for (@word)
                {
                        if ($_ =~ m/^[[a-zA-Z]]/)
                        {
                        }
                        elsif ($_ =~ m/^[^a-zA-Z]/)
                        {
                                $chars = $chars.$word[$wordCounter];
                                $word[$wordCounter] = '';
                        }
                        $wordCounter++;
                }
        $wordOut = join('',@word);
        push (@returns, $chars);
        push (@returns, $wordOut);
        return \@returns;
}

lcCaps makes all words lowercase, and returns a reference to an array containing a flag indicating if the word was completely capitalized, had first letter capitalization, or was lowercase.

sub lcCaps
{
        my $wordIn = shift;
        my $capWord = uc($wordIn);
        my $firstChar = substr($wordIn, 0, 1);
        my $wordOut = lc($wordIn);
        my $flag = 0;
        if ($firstChar =~ m/[A-Z]/)
        {
                $flag = 1;
        }
        if ($wordIn eq $capWord)
        {
                $flag = 2;
        }
        my @returnVals = ($flag, $wordOut);
        return \@returnVals;
}

alterWord runs the process of converting each word. It calls many of the subroutines.

sub alterWord
{
        my $wordIn = shift;
        my @word = split ('',$wordIn);
        my $wordLen = length $wordIn;
        my $firstChars = "";
        my $firstLetter = "";
        my $capFlag = 0;
        my $lastChars = "";
        my $wordOut = "";
        my $wordMangle = "";
#Remove Starting Punctuation
        my $rFromSub = removeFirstPunct($wordIn);
        my @inFromSub = @$rFromSub;
        $firstChars = $inFromSub[0];
        $wordIn = $inFromSub[1];
#Remove Ending Punctuation
        $rFromSub = removeLastPunct($wordIn);
        @inFromSub = @$rFromSub;
        $lastChars = $inFromSub[0];
        $wordIn = $inFromSub[1];
#Determine capitalization. amd reomove capitalization
        my $capResults = lcCaps($wordIn);
        my @lcCap = @$capResults;
        $capFlag = $lcCap[0];
        $wordIn = $lcCap[1];
##Determine if the word starts with aeiou
        $firstLetter = substr ($wordIn, 0, 1);
        if ($firstLetter =~ m/(a|e|i|o|u)/)
        {
                $wordIn = alterVowelStarts($wordIn);
        }
        elsif ($firstLetter =~ m/[^aeiou]/)
        {
                $wordIn = alterConsStarts($wordIn);
        }
        else
        {
                $wordIn = $wordIn;
        }
#Restore Capitalizations
        if ($capFlag == 1)
        {
                $wordIn = ucfirst($wordIn);
        }
        elsif ($capFlag == 2)
        {
                $wordIn = uc($wordIn);
        }
        $wordOut = $firstChars.$wordIn.$lastChars;
        return $wordOut;
}

splitLine splits each line of the text file into words, converts them, by calling alterWord, and returns a string containing the altered line.

sub splitLine
{
        my $source = shift;
        my $word = "";
        chomp($source);
        my @destination = ();
        my $destiny = "";
        my @sourceArray = split(' ',$source);
        foreach (@sourceArray)
        {
                $word = alterWord($_);
                push (@destination, $word);
        }
        $destiny = join (' ',@destination);
        return $destiny;
}

main runs the whole process. It calls directly, openFile,

sub main
{
        my $sourcePTR = openFile();
        my @source = @$sourcePTR;
        my $lineCounter = 0;
        my @alteredLines = ();
        foreach (@source)
        {
                $alteredLines[$lineCounter] = splitLine($_);
                $lineCounter++;
        }
        saveFile(\@alteredLines);
}