I have one last item I would like to discuss pertaining to the use of html forms. That is, the question on how to store unvetted, input data. I will show it makes sense to store the news items as formatted, ready to publish summaries as separate text files each containing one story. I further suggest they be deposited in an off site directory with severely limited access rights. The formatting for news items is simple, using either a template or an informed (trained) user. I show how I would have implemented a simple file naming structure to make news items easier to work with in an inherently date / time ordering. For this particular, limited instance, I think my model would have worked.
Obviously there a number of tasks I mentioned at the start of this series that I will not address. However, I now think it more productive to move on to topics tested on extant sites. The thought experiments while interesting, have run their course. I suspect we all want something more concrete, i.e. proven to have worked over a design with as a guide that probably has uncaught flaws.
At first sight this may seem like a needless digression, however, bear with me and I will explain its pertinence [2.]. The spam attack I saw first hand into an active form was sloppy untargeted dumping of inappropriate debris into any input field that was available. Moreover, whatever potential it had for success was completely obviated by indiscriminate dumping of spam links. The purpose seems to inflate the statistics when at its base the service is garbage unless the play is devalued low percentage of return.
My assertion is simple, these attacks are primarily [3.] ones of opportunity, i.e. they are untargeted. Hence, any open input will be by default filled. Thus, it is much less likely the news item entries will be compromised with trash entries. Therefore, news item content are good candidates for the methods I propose.
For probably the last time, I exhibit the OpenSourceToday data entry form. I think it makes my point quicker with the graphic that a text description:
Figure 1. News Item/Article Input HTML Form
Notice that the Title, Article Summary, Key Words and even the File Name could be left blank. Indeed, to be a News item the first three must be empty. In addition, the total size the Article could be used to dispense with obvious frauds. Therefore, news items meeting these criteria are very likely to be real.
The template is embarrassingly simple:
<h2 id="central-text">[News Item Title]</h2>
<p>[News Item, 1st Paragraph]</p>
<p>[News Item, 2nd Paragraph]</p>
...
Listing 1. News Item Template
The writer needs only to insert the item's title and preface any paragraph with a <p> and end with </p>. The task in numbingly simple, hence, the content deposited in the Article input text region should be publishable as it stands.
The only customization in the template is in the specified headline tag that is defined by the site's cascading style sheet. Its presence could be a useful validation criterion, which is one more way to say the content for this content can be safely stored.
At this stage the only real outstanding question is how to name the files. As I already gave a strong indication in my previous article, the structure seemed both simple and obvious. That is, starting with either simply "news" or "news-item" that is followed by a time stamp of when the file was created as part of the name. The actual time as part of the listing was maintained in the code by using the "archive" option when copied.
As I wrote in the article cited just above, the stored version resided in an off server directory and looked like this:
$news_temp = $path . "/" . $news . "-" . \ $digital_time_stamp . "\.txt";
The $path we want is off the server directory. If we were running our own server, I have seen suggestions that /usr with a sub-directory beneath it would be a good choice. However, I am less than sanguine about the choices offered on a shared server. Moreover, I am uncertain what to suggest that I think would guarantee secure storage. Therefore, the value of $path has to be determined what is perceived best for the site applying this sort of coding device.
It is time to turn our attention to the other variables involved in the temporary file naming and the method that might have been used to move content into these files:
// this would have // worked for me $news = "news"; $digital_time_stamp = date(Y-m-d-G:i:s); // example of file name [4.] // $news_temp = $path . "/news-2008-09-18-08:32:23.txt"; Listing 2. Creating Temporary Storage File Name in Safe Directory
The next step is moving content into the file created with this name. In addition, we have to define the content and write it into this file:
// the variable name in
// in the form was actually
// article not content that
// was used in other articles
$content = $article;
$tmp_file_handle = fopen($news_temp, "w");
// the title for the news item is
// needed for the email seeking
// confirmation from the author
fwrite($tmp_file_handle, $content);
// if no error close
fclose($tmp_file_handle);
Listing 3. Writing Content into Temporary
Storage File
Other than a clean copying of this file into to the production directory, this file is only opened to be read, if the file only method is employed. For this class of input these simple steps suffice for storage and later transfer for use in page rendering.
News items would have been special, because the limited inputs would have fooled most automated spam depositors lessening the risk. Confirmation would have been easier. These and other factors made it simpler to envision a safer storage means, i.e. dump the content ready to use in an off server directory until vetted. Then the content was ready for immediate use. There is a final aspect that makes this content less critical, it is of short lived value with no need for archival storage. Therefore, flaws and loses are less painful.
This means of storage would have worked best when the content in separate files was read sequentially of a date / time ordered listing. Then each file would have been read as it popped off the current News Page stack. The combination of a single set of files with consistent naming and relying upon separate, unaltered files is one of the best arguments against the need for databases as the foundation for every web site. Though I am still biased towards databases, I like the second offered solution for News Page automation over the first, that has to recreate too much readable content upon every use.
In this series, I stressed the concepts and code were thought experiments where I gave free reign to the concept of what I might have done had the OpenSourceToday site not been terminated. Hence, I recommend again that the code listings were guesses not tested recipes. Moreover, I regard even proven code, i.e. processes that are shown to work, are not set in stone. One example that was on my mind while writing this article was the second method to exhibit news items. In particular, I would have altered Listing 7. to make the code more resilient. I would have made the loop increment number dependent upon opening the content file. If the file open function failed, I would have just rerun the loop without incrementing the count. That way if some files were corrupted a page would show with stories, albeit with out-of-date content. The latter would have been an issue for the web master, who would have been notified of the failure.
The major point I wish to stress is do not stop thinking about how to improve your code. Even working, functional code could have good reasons justifying alteration. It helps too, to think in reusable blocks. That is, code residing in library functions or with class designs require less effort to test the validity of different approaches. Web pages that are unique in their application of code are a route to an unsupportable site once it gains bulk. Therefore, I advise thinking of common components where a change in one location affects the total site.
As a more extreme example, that contradicts my stated assumptions in this article, just go to upper end of this article where I listed input fields that should or could be left empty. Notice that very little was said of data verification. The title could have played a role in two differ contexts. For example, if inserted the title could help identify the content to be verified by the supposed author. Or the actual title withing the h2 tags could have been used, making the Title input field redundant, hence, better left empty. Therefore, mix up your thought processes to see your application in a differing light that may improve its quality.
Those ideas will guide my writing on code that will follow my finally taking on some mundane tasks I kept putting off. Now the thoughts are not paramount, the janitorial work begins as real tasks on extant sites.
Corrections, suggested extension or comments write: H. Cohen.
© Herschel Cohen, All Rights Reserved
____________________________________________________________________
1. Just the last for this series. Also the simplest, however,
there are more issues even for this site that remain
unaddressed. Return
2. This does not mean the code and methods were stupid,
the creators many times are not the ones running the
businesses, hence, one should not base an assessment
totally on the actions of the latter types. The wit and
criminality of these two groups differs. Return
3. See article footnote 3, of listed links to past
discussions. Return
4. The codes mean:
Y is a four digit year;
m a two digit month;
d 2 digit day;
G a two digit hour representation (00 - 23);
i two digits minutes and s two digits (but can be
dropped). Return
____________________________________________________________________