How can I tell the difference between a post from a browser, and someone trying to post programmatically

Go To StackoverFlow.com

1

Is there a way to determine if the request coming to a handler (lets assume the handler responds to get and post) is being performed by a real browser versus a programmatic client?

I already know that it is easy to spoof things like the User Agent and the Referrer, but are there other headers that are more difficult to spoof? Maybe headers that are not commonly available in classes like .net's HttpWebRequest?

The other path that I looked at is maybe using the Encrypted View State to send a value to the browser that gets validated on the server side, though couldn't that value simply be scraped from the previous response and added as a post parameter to the next request?

Any help would be much appreciated, Cheers,

2012-04-05 23:06
by Stefan H
Have you considered that it's trivial to use Fiddler to record authentic browser requests, make edits, and resend them with new values? How would you propose identifying that from the remote server - asawyer 2012-04-05 23:08
Short answer: you can't - Oliver Charlesworth 2012-04-05 23:08
@asawyer I have indeed considered that. You'll notice that I am not proposing anything, but rather asking if there is a way it can be done - Stefan H 2012-04-05 23:09
@OliCharlesworth and that is indeed the sneaking suspicion that I have, but I am holding out hope that someone has a novel way of doing it - Stefan H 2012-04-05 23:09
I suppose that this is the problem that captcha's aim to solv - Stefan H 2012-04-05 23:10
Yes, I should qualify my previous comment. There is certainly no reliable way of doing this without some kind of deliberate intrusive interaction with the user (captchas are the canonical example). But even they aren't reliable; for every captcha there'll eventually be an AI that can beat it - Oliver Charlesworth 2012-04-05 23:19
I wholeheartedly disagree with Oli Charlesworth. The key is combinatorial complexity - FlavorScape 2012-04-05 23:35
Check out vouchsafe's solution/research... http://www.vouchsafe.com/play-games try automating that - FlavorScape 2012-04-06 00:47


1

There is no easy way to differentiate because in the end, a post programitically looks the same to the server as a post by a user from the browser.

As mentioned, captcha's can be used to control posting but are not perfect (as it is very hard but not impossible for a computer to solve them). They also can annoy users.

Another route is only allowing authenticated users to post, but this can also still be done programatically.

If you want to get a good feel for how people are going to try to abuse your site, then you may want to look at http://seleniumhq.org/

This is very similar to the famous Halting Problem in computer science. See some more on the proof, and Alan Turing here: http://webcache.googleusercontent.com/search?q=cache:HZ7CMq6XAGwJ:www-inst.eecs.berkeley.edu/~cs70/fa06/lectures/computability/lec30.ps+alan+turing+infinite+loop+compiler&cd=1&hl=en&ct=clnk&gl=us

2012-04-05 23:23
by Travis J
I'm not sure this has anything to do with the halting problem - Oliver Charlesworth 2012-04-05 23:31
@OliCharlesworth - I believe it does in that any programatical approach taken to prevent a program from posting can be programatically countered - Travis J 2012-04-05 23:32


1

The most common way is using captcha's. Of course captcha's have their own issues (users don't really care for them) but they do make it much more difficult to programatically post data. Doesn't really help with GETs though you can force them to solve a captcha before delivering content.

2012-04-05 23:17
by Brian


-2

Many ways to do this, like dynamically generated XHR requests that can only be made with human tasks.

Here's a great article on NP-Hard problems. I can see a huge possibility here: http://www.i-programmer.info/news/112-theory/3896-classic-nintendo-games-are-np-hard.html

One way: You could use some tricky JS to handle tokens on click. So your server issues token-id's to elements on the page during the backend render phase. Log these in a database or data file. Then, when users click around and submit, you can compare the id's sent via the onclick() function. There's plenty of ways around this, but you could apply some heuristics to determine if posts are too fast to be a human or not, that is, even if they scripted the hijacking of the token-ids and auto submitted, you could check that the time between click events appears automated. Signed up for a twitter account lately? They use passive human detection that while not 100% foolproof, it is slower and more difficult to break. Many if not all of the spam accounts there had to be human opened.

Another Way: http://areyouahuman.com/

As long as you are using encrypted methods verifying humanity without crappy CAPTCHA is possible.I mean, don't ignore your headers either. These are complimentary ways.

The key is to have enough complexity to make for an NP-Complete problem in terms of number of ways to solve the total number of problems is extraordinary. http://en.wikipedia.org/wiki/NP-complete

When the day comes when AI can solve multiple complex Human problems on their own, we will have other things to worry about than request tampering.

http://louisville.academia.edu/RomanYampolskiy/Papers/1467394/AI-Complete_AI-Hard_or_AI-Easy_Classification_of_Problems_in_Artificial

Another company doing interesting research is http://www.vouchsafe.com/play-games they actually use games designed to trick the RTT into training the RTT how to be more solvable by only humans!

2012-04-05 23:09
by FlavorScape
The answer to the OP's question is: no, there is no way. Suggestions for ad-hoc trickery like this just add code complexity, and don't solve the problem, because it's not solvable - Oliver Charlesworth 2012-04-05 23:12
You are wrong. Task puzzles are perfectly valid ways of doing a turing test. Many ways to do this, like dynamically generated XHR requests that can only be made with human tasks - FlavorScape 2012-04-05 23:23
As I said in a comment further up, for every captcha-like puzzle, there will eventually be an algorithm that can beat it. The answer you proposed can easily be circumvented by a deliberate attacker - Oliver Charlesworth 2012-04-05 23:26
You're wrong. Cracking a large variety of puzzles with N-dimensional complexity is an immense human intelligence task-- if you have enough combinatorial complexity, there's no way to brute force it, because that is easy to detect. As long as you have enough combinations to make it an NP-Complete problem, you're good - FlavorScape 2012-04-05 23:29
Who said anything about brute force - Oliver Charlesworth 2012-04-05 23:35
Well you can't write an algorithm for an NP complete problem, so you'd have to brute force requests which is easy to block and detect - FlavorScape 2012-04-05 23:36
Do you have an example? Specifically, an example of something that's trivially solvable by a human that couldn't conceivably be solved one day by sufficiently-advanced AI - Oliver Charlesworth 2012-04-05 23:37
Air traffic control is NP-Complete... As you string turing tests together per request, the complexity grows. Start stacking multiple turning tests together per request and you can't make a predictive algorithm. For instance, string N-number of human solvable games together like those found on http://areyouahuman.com/ but i mean the line between human/human assisted ai and animal ai (like CV) will be blurring. the difference is as long as there is combinatorial complexity for each submission, you can't write heuristics for it - FlavorScape 2012-04-05 23:49
If I can write an algorithm to solve game A, and a different algorithm to solve game B, etc., then it doesn't matter how times you throw them at me in a row (so long as I can programmatically identify which game is currently being run). And if the problem is NP-complete, then a human isn't going to be to able to give you the optimal solution either - Oliver Charlesworth 2012-04-05 23:54
I'm not saying make the problem itself NP complete, I'm saying make the number of games and the number of combinations of content be difficult in itself to identify or constantly changing. So the inputs->outputs mapping is NP complete so YOU would have to play all of, for instance 986,000 combinations to write the algorithm to solve game A, even after you've written the algorithm to identify the game type. Put A, B and C together and you'd never be able to machine verify all the incorrect answer combinations so you could not automate feature detection - FlavorScape 2012-04-06 00:14
Fact is you can create AI complete reverse Turing tests. The problem is, you can't stop spam farms paying kids in developing countries to manualy solve them and post ads for viagra. Those farms wouldn't exist if it were that easy to break good, well thought-out RTTs. You can't detect the motivations of a human easily though - FlavorScape 2012-04-06 00:36
Ads