I was looking to strip out base_url from input supplied via html input=text and pulled from _POST. The input itself is most likely expected to contain the full uri, but also/and quite possibly a port assignment followed by a few more path delimiters.
example: https://lab1.sfo1.transparentpixel.com:554/rtmp/_definst_
I needed up to 3 instantiations of the result and those values end up getting placed into an array.
So to test things in a stand alone script, I ended up with the following code:
OLD FOR HISTORICAL REVIEW:
<?php
$var1 = "https://lab1.sfo1.transparentpixel.com:1935/rtsp/_definst_";
$var2 = "http://lab1.sfo1.transparentpixel.com:1935/rtmp/_definst_";
$var3 = "lab1.sfo1.transparentpixel.com";
$count = 1;
while ( $count <= 3 )
{
$test[] = 'var'.$count.' = ' . preg_replace(array("#^.*/([^\:]+)\:.*#"), '$1', ${var.$count});
$count++;
}
var_dump($test);
?>
CORRECTED AFTER EDIT:
<?php
$url1 = "https://lab1.sfo1.transparentpixel.com:1935/rtsp/_definst_";
$url2 = "http://lab1.sfo1.transparentpixel.com:1935/rtmp/_definst_";
$url3 = "lab1.sfo1.transparentpixel.com";
$count = 1;
while ( $count <= 3 )
{
$test[] = 'url'.$count.' = ' . preg_replace(array("#^.*/([^\:]+)\:.*#"), '$1', ${url.$count});
$count++;
}
print_r($test);
?>
My result:
$ php tpixel_url_replace.php
Array
(
[0] => url1 = lab1.sfo1.transparentpixel.com
[1] => url2 = lab1.sfo1.transparentpixel.com
[2] => url3 = lab1.sfo1.transparentpixel.com
)
While this works as I intended, I'm certainly missing some iterations. Anyone care to elucidate things I may be overlooking? Yes, I know I could have used str_replace but the cost of running preg_ over str_ is minimal in the overall scheme of things.
I'm simply looking for insight as I'm 100% sure I'm not a master of anything regarding reg-ex nor preg_replace.
Input?
I hope I understand your question correctly. Are you having trouble with the regex or the code for looping over the urls? Or both?
I'm going to assume both...
Instead of matching the whole thing and grouping the bit you want to extract, I'd suggest you match just what you want to extract. With that in mind, the regex could look like this:
[^/]+\.[^/:]{2,3}
In english this says:
Match anything except a forward slash until there is a dot, then match between 2 and 3 more of anything except a forward slash or a colon
This seems simple, but i think it gets you what you need.
Here is a bit of php code that creates an array of urls in various formats and then loops though each one and extracts just the bit i think you want. I've switched to using preg_match
instead of preg_replace
because i think it makes more sense in this case:
<?php
$urls = array(
"https://lab1.sfo1.transparentpixel.co.jp:1935/rtsp/_definst_",
"http://lab1.sfo1.transparentpixel.com:1935/rtmp/_definst_",
"http://lab1.sfo1.transparentpixel.com/rtmp/_definst_",
"lab1.sfo1.transparentpixel.com",
"someurl.com:1935/rtmp/_definst_",
"someurl.com/_definst_",
"http://someurl.co.uk");
foreach($urls as $url)
{
preg_match('%[^/]+\.[^/:]{2,3}%m', $url, $matches);
echo $matches[0]; // instead of this you could do $test[] = $matches[0];
}
?>
You'll notice that I'm looping over the array using a foreach loop which means we are not limited to a fixed number of iterations as in your example.
The output of this is:
lab1.sfo1.transparentpixel.co.jp
lab1.sfo1.transparentpixel.com
lab1.sfo1.transparentpixel.com
lab1.sfo1.transparentpixel.com
someurl.com
someurl.com
someurl.co.uk
https://:1935/rtsp/_definst_
for the first url). To be honest, the regex change i suggested was only because i think using preg_match
in the code is more readable (in my opinion), if yours works and the method makes sense to you, then go with it. One question about your original post... what did you mean by "I'm certainly missing some iterations" - Robbie 2012-04-05 19:24
http://someurl.com
orsomeurl.com:1935/rtmp/_definst_
- Robbie 2012-04-03 20:52