Videoconferencing is not trivial to implement. You have to first look at a signalling protocol like SIP or XMPP which would provide infrastructure to maintain a list of 'friends' and whether they are online or offline. The signalling infrastructure would also let you place a call and alert you when you receive a call. And then there is the question of actually flowing video/audio between two endpoints. There is no single solution that you can plug into your app.
You can look at Google's Libjingle or ConferenceXP for a starting point.
To establish a video conference you would need the following information:
Whom can you call? This is your friends list. You would need a mechanism to add friends to your list and ensure that you can only add friends that are willing to communicate with you
How do you establish a communication channel with your friends? For example what is their IP Address, can they support a particular video codec and so on.
Once you have established a way to communicate with your friend, then there is the question of receiving audio and video information and displaying it with correct timing.
For points 1 and 2 you would need what is called a signalling and presence protocol. SIP and XMPP are two very popular open protocol.
For point [3] you would look at a protocol like RTP.
You can google SIP, XMPP and RTP. You would receive a wide variety of literature. Look at the RFC documents for exact information they they can be a bit arcane.
Libjingle is a library written in C++ implements XMPP and RTP
ConferenceXP is a RTP implementation in C# with some basic signalling but I think you can get a start with the examples without really getting into details.