Hi I'm making a spell checker in c that has a dictionary in an array of strings and uses binary search to find words in dictionary.
My problem is that I am trying to read text from a file and output the text back to a new file with wrong words highlighted like this: ** spellingmistake ** but the file will include characters such as .,!? which should be output to the new file but obviously not be present when comparing the word to the dictionary.
so I want this:
text file: "worng!"
new file: "** worng **!"
I've been trying to solve this the best I can and have spent quite a while on google, but am not getting any closer to a solution. I have written the following code so far to read each character and fill two char arrays one lower case temp for dictionary comparison and one input for original word which works if there is no punctuation but obviously I loose the space this way when punctuation is present I'm sure there is a better way to do this but I just can't find it so any pointers would be appreciated.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#define MAX_STRING_SIZE 29 /*define longest non-technical word in english dictionary plus 1*/
/*function prototypes*/
int dictWordCount(FILE *ptrF); /*counts and returns number of words in dictionary*/
void loadDictionary(char ***pArray1, FILE *ptrFile, int counter); /*create dictionary array from file based on word count*/
void printDictionary(char **pArray2, int size); /*prints the words in the dictionary*/
int binarySearch(char **pArray3, int low, int high, char *value); /*recursive binary search on char array*/
void main(int argc, char *argv[]){
int i; /*index*/
FILE *pFile; /*pointer to dictionary file*/
FILE *pInFile; /*pointer to text input file*/
FILE *pOutFile; /*pointer to text output file*/
char **dict; /*pointer to array of char pointer - dictionary*/
int count; /*number of words in dictionary*/
int dictElement; /*element the word has been found at returns -1 if word not found*/
char input[MAX_STRING_SIZE]; /*input to find in dictionary*/
char temp[MAX_STRING_SIZE];
char ch; /*store each char as read - checking for punctuation or space*/
int numChar = 0; /*number of char in input string*/
/*************************************************************************************************/
/*open dictionary file*/
pFile = fopen("dictionary.txt", "r"); /*open file dictionary.txt for reading*/
if(pFile==NULL){ /*if file can't be opened*/
printf("ERROR: File could not be opened!/n");
exit(EXIT_FAILURE);
}
count = dictWordCount(pFile);
printf("Number of words is: %d\n", count);
/*Load Dictionary into array*/
loadDictionary(&dict, pFile, count);
/*print dictionary*/
//printDictionary(dict, count);
/*************************************************************************************************/
/*open input file for reading*/
pInFile = fopen(argv[1], "r");
if(pInFile==NULL){ /*if file can't be opened*/
printf("ERROR: File %s could not be opened!/n", argv[1]);
exit(EXIT_FAILURE);
}
/*open output file for writing*/
pOutFile = fopen(argv[2], "w");
if(pOutFile==NULL){ /*if file can't be opened*/
printf("ERROR: File could not be created!/n");
exit(EXIT_FAILURE);
}
do{
ch = fgetc(pInFile); /*read char fom file*/
if(isalpha((unsigned char)ch)){ /*if char is alphabetical char*/
//printf("char is: %c\n", ch);
input[numChar] = ch; /*put char into input array*/
temp[numChar] = tolower(ch); /*put char in temp in lowercase for dictionary check*/
numChar++; /*increment char array element counter*/
}
else{
if(numChar != 0){
input[numChar] = '\0'; /*add end of string char*/
temp[numChar] = '\0';
dictElement = binarySearch(dict,0,count-1,temp); /*check if word is in dictionary*/
if(dictElement == -1){ /*word not in dictionary*/
fprintf(pOutFile,"**%s**%c", input, ch);
}
else{ /*word is in dictionary*/
fprintf(pOutFile, "%s%c", input, ch);
}
numChar = 0; /*reset numChar for next word*/
}
}
}while(ch != EOF);
/*******************************************************************************************/
/*free allocated memory*/
for(i=0;i<count;i++){
free(dict[i]);
}
free(dict);
/*close files*/
fclose(pInFile);
fclose(pOutFile);
}
I'm not 100% sure I've understood your problem correctly, but I'll give it a shot.
First, your loop
do{
ch = fgetc(pInFile);
/* do stuff */
}while(ch != EOF);
also runs when the end of file has been reached, so if the last byte of the file is alphabetical, you will either print an undesired EOF
byte to the output file, or, since you cast ch
to an unsigned char
when passing it to isalpha()
, which usually results in 255 [for EOF = -1
and 8 bit unsigned char
], it will in some locales (en_US.iso885915, for example) be considered an alphabetic character, which results in suppressing the last word of the input file.
To deal with this, firstly, don't cast ch
when passing it to isalpha()
, and secondly add some logic to the loop to prevent unintentional handling of EOF
. I chose to replace it with a newline if the need arises, since that's simple.
Then it remains to print out the non-alphabetic characters which don't immediately follow alphabetic characters:
do{
ch = fgetc(pInFile); /*read char fom file*/
if(isalpha(ch)){ /*if char is alphabetical char*/
//printf("char is: %c\n", ch);
input[numChar] = ch; /*put char into input array*/
temp[numChar] = tolower(ch); /*put char in temp in lowercase for dictionary check*/
numChar++; /*increment char array element counter*/
}
else{
if(numChar != 0){
input[numChar] = '\0'; /*add end of string char*/
temp[numChar] = '\0';
dictElement = binarySearch(dict,0,count-1,temp); /*check if word is in dictionary*/
if(dictElement == -1){ /*word not in dictionary*/
fprintf(pOutFile,"**%s**%c", input, (ch == EOF) ? '\n' : ch);
}
else{ /*word is in dictionary*/
fprintf(pOutFile, "%s%c", input, (ch == EOF) ? '\n' : ch);
}
numChar = 0; /*reset numChar for next word*/
}
else
{
if (ch != EOF) {
fprintf(pOutFile, "%c",ch);
}
}
}
}while(ch != EOF);
fgetc
or getchar
isn't necessary. Functions like isalpha
take an int
argument that must be EOF
or the value of an unsigned char
, what you get from fgetc
, so there a cast isn't necessary and, if the result is EOF
, may be harmful. For functions that can't handle EOF
, you have to check for that before calling, and then casting can't do harm (and if the function takes an unsiged char
as argument, may be necessary to avoid a compiler warning). Rule of thumb: don't cast unless you know it's necessary or the compiler tells you - Daniel Fischer 2012-04-05 12:47
It looks like right now if the char isn't alphabetical it triggers the else
block for if(isalpha((unsigned char)ch)){
and the character itself gets ignored.
If you add a statement to just print all non-alphabetical characters out exactly as they come in, I think that'd accomplish what you want. This'd need to go inside that else
block and after the if(numChar != 0){
block and would just be a simple fprintf statement.